2. Semantic Abstractions for specifying, designing and constraining

advertisement
COMPUTER TECHNOLOGY INSTITUTE
1999
Semantic Abstractions in the Multimedia Domain
Elina Megalou1,2 & Thanasis Hadzilacos1
1 Computer Technology Institute, and
2 Dept. of Computer Engineering and Informatics, University of Patras, Greece
Kolokotroni 3, GR-26221, Patras, Greece
e-mail: {megalou, thh}@cti.gr
http://www.cti.gr/RD3/
Abstract -- Information searching by exactly matching content is traditionally a strong point of machine searching; it
is not however how human memory works and is rarely satisfactory for advanced retrieval tasks in any domain multimedia in particular, where the presentational aspects can be equally important to the semantic information content
of multimedia applications. A combined abstraction of their conceptual and presentational characteristics, leading on
the one hand to their conceptual structure (with classic semantics of the real world modeled by entities, relationships
and attributes) and on the other to the presentational structure (including media type, logical structure, temporal
synchronization, spatial (on the screen) “synchronization”, interactive behavior) is developed in this paper. Multimedia
applications are construed as consisting of “Presentational Units”: elementary (with media object, play duration and
screen position), and composite (recursive structures of PUs in the temporal, spatial, and logical dimension. The
fundamental concept introduced is that of Semantic Multimedia Abstractions (SMA): qualitative abstract descriptions
of multimedia applications in terms of their conceptual and presentational properties at an adjustable level of
abstraction. SMAs, which could be viewed as metadata, form an abstract space to be queried. A detailed study of
possible abstractions (from multimedia applications to SMAs and SMA-to-SMA), a definition and query language for
Semantic Multimedia Abstractions (SMA-L) and the corresponding SMA model (equivalent to extended OMT), as
well as an implementation of a system capable of wrapping the presentational structure of XML-based documents
complete this work, whose contribution lays in the classically fruitful boundary between AI, software engineering and
database research.
Index Terms ---- Multimedia data model, semantic modeling, abstraction, semantic multimedia abstraction, spatio-temporal
retrieval, multimedia query language.
1.
Introduction
Looking for a piece of information among many, is one of the basic tasks in computer science -searching. The
traditional approach to searching makes two assumptions: First that we know, exactly, what we are looking for;
second that we can organize the data where we are looking to find it –the so called search space. These
assumptions are not true in all searching situations. For instance when as humans we try to recall information in
our minds we may have an approximate, fuzzy or incomplete description of it at a non uniform level of detail; as
for the “search space” if it does have an organization, it escapes us. This paper is about looking up multimedia
applications; on CD’s or in the Web they form a huge, loosely organized if at all, distributed search space with a
wealth of information.
“Get me multimedia applications on music teaching for string instruments; I recall seeing one with audio and video
(or was it a series of slides?) at the same time; half of the remaining screen was filled with music score and the rest
with textual instructions or explanations; it included Paganini’s Moto perpetuo”. This is the type of queries we deal
with. Why? Because from cognitive science we know that this is how people remember. But also because –and this
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
1
COMPUTER TECHNOLOGY INSTITUTE
1999
has been the original motivation of this work- this is how multimedia applications may be specified and such
specifications -available from the design phase or to be reconstructed- would be a most suitable search space.
The name of the game is abstraction. A fundamental human cognitive process and skill [27], and a basic
mathematical and computer science tool for problem solving in general [17], [22] and searching in particular [21].
It implies the transformation of our target objects by shedding some of their properties, those deemed irrelevant for
the task at hand. Such a transformation may be so drastic that it changes the domain of discourse: from land parcels
to rectilinear two-dimentional figures is the classic abstraction better known as Euclidean geometry. Properties
such as color, weight, substance become irrelevant and objects of the real worlds are mapped into their classes of
equivalence. All this is discussed, as background relevant research in Section 3.3.1.
From the example query, from our work in specification of multimedia titles in series (a software engineering
methodology developed to facilitate and automate the generation of classes of similar multimedia
applications)[16],[37],[38],[53] and from a wealth of research during the past few years [2],[7], [10],
[12],[19],[29], [39], [55] it is clear that we need to abstract on the conceptual and presentational characteristics of
multimedia applications at the same time. The conceptual structure can be neatly captured with classic semantic
models: entities, relationships and attributes are the basic tools augmented with higher level structuring concepts
(such as aggregation, grouping, and classification) for which both theory and tools are reasonably well developed
[46], [9], [42]. For the presentational structure of multimedia applications more is needed although a lot has been
done [12], [19],[33],[34],[36],[54]. Our analysis (Sections 2.2.1 and 2.2.2) indicates media type, logical structure,
temporal synchronization, spatial (on the screen) “synchronization”, and interactive behavior as being the main
aspects.
Our contribution to this analysis is the concept of Presentational Unit. A structurally scalable unit, the PU can be
elementary (just a media object positioned in time and place within a multimedia presentation, i.e. augmented with
playout duration and screen position) or composite (recursively consisting of simpler ones combined logically,
synchronized temporally or put together on the screen). This is detailed in Section 2.2.3.
The main contribution of the paper regards the analysis of abstractions in the multimedia domain. Semantic
Multimedia Abstractions (SMA) are qualitative abstract descriptions of a multimedia application in terms of its
conceptual and presentational properties at an adjustable level of abstraction (Section 2.3). For instance, while a
multimedia application needs absolute temporal durations for its media objects, an SMA would only retain their
relative temporal relationships. SMAs are metadata and they form an abstract space rather suitable for searching.
The base abstraction leads from representations of multimedia applications (in XML for example) to SMAs; from
then on hierarchies of abstract spaces can be created using SMA-to-SMA transformations which move up
abstraction level by relaxing constraints, and wrapping temporal, spatial or logical structures. The admissible types
of abstractions (with minor dependencies on the language used) are studied in detail in Sections 3.3 and 3.4
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
2
COMPUTER TECHNOLOGY INSTITUTE
1999
whereas the SMA definition and query language is given in Table 1 in BNF.
Although not the subject of this paper, a system has actually been developed to support the SMA model –itself
equivalent to OMT suitably extended with compatible definitions of temporal and spatial aggregation and
grouping. The system has been used to exemplify our ideas using XML as the base document language and Section
3.5 concludes the paper with an illustrative example.
Some of the most interesting research takes place in the boundaries of traditional computer science areas. This
work is a contribution in a classic fertile such boundary, between databases, artificial intelligence and software
engineering [9].
2.
Semantic Abstractions for specifying, designing and constraining multimedia applications
2.1 Automating the generation of multimedia applications in series: Motivation and the MULTIS systems
We started studying abstraction in the multimedia domain while approaching a software engineering problem: how
to specify a set of thematically and structurally “similar” multimedia applications (a multimedia series) in order to
design and build a special-purpose authoring environment which facilitates the development of each multimedia
application of the series. Towards this end, a production methodology called MULTIS (Multimedia Titles In
Series) [16] was proposed consisting of the following steps: initially, domain knowledge providers and multimedia
designers identify the desired common properties of all multimedia applications in the series and produce a generic
specification –called “Model Title Specification”- of the multimedia series in terms of them; based on the Model
Title Specification, computer engineers build a special-purpose authoring system –called a MULTIS system- that
embodies such properties in its own structure; using the MULTIS system, end-users “fill-in” the particular
properties of each personalized multimedia application and ask for the automatic generation of the application’s
source code.
Hence, the generic specification of a multimedia series is a fundamental issue in the MULTIS approach. The
Model Title Specification captures both the knowledge of the application domain (concepts and relations) and the
presentational and behavioral properties of the multimedia
Code Generator
(Presentation Layer)
applications, in a way, generic enough to represent the whole
Title Editor
(Control Layer)
series while adequately focused to entail in an easy to use
Application Database
(Application Layer)
MULTIS system.
Multimedia Database
(Data Layer)
For each multimedia series, the corresponding MULTIS
Model Title Specification
(Specifications Layer)
system consists of a custom multimedia database for the
Figure 1: The MULTIS Layered Architecture
organization and storage of the domain-specific multimedia
data
and
an editing environment for
the
definition
(instantiation), storage and automatic generation of each multimedia application in the series. From the database
point of view, MULTIS are special-purpose database systems enhanced with code generation functionality. The
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
3
COMPUTER TECHNOLOGY INSTITUTE
1999
Model Title Specification guides the development of two database schemata, the schema of the multimedia
database and that of the application database and it determines the design of an application-specific user interface
for the editing and instantiation of each personalized multimedia application. Both databases are modeled with
object-oriented models; each multimedia application forms a network of objects which reflects its specific
structure, behavior and data and is stored as an independent database.
In a MULTIS system, the knowledge about the presentational and structural characteristics of the multimedia
applications in the series is embedded in each object class and it is used in the code generation process; each object
“knows how to present itself”, it produces its code and propagates a pertinent message to the appropriate objects of
the network. Figure 1 depicts the MULTIS layered architecture; layers communicate with adjacent ones but
operate independently allowing the separation of the multimedia data and their presentation.
The MULTIS approach was validated in practice within the context of the EEC funded project “Valmmeth” [53]
whose aim was to demonstrate the feasibility and benefits of publishing series of multimedia applications using
this technology. Based on the Model Title Specifications of four multimedia series (foreign language training
applications, business presentations, point of information systems (POISs) and medical training applications) given
by domain experts and multimedia designers, the corresponding four MULTIS systems were developed and tested
at pilot sites in Greece, Belgium and UK.
2.2 Identifying the properties of multimedia applications in a semantic abstract representation
2.2.1
The specification of a multimedia series as an abstraction process
At the specification and design stage of a software system, the goal is to capture the desired “functionality”
ignoring implementation details. Taking the MULTIS example, when developing a Model Title Specification the
goal is to specify the functionality of a multimedia series by capturing only the desired common properties of
multimedia applications that the corresponding MULTIS system is able to produce and ignoring those properties in
terms of which these multimedia applications are allowed to differ. In other worlds, the Model Title Specification
constrains the applications to be generated to those considered “identical” in terms of certain properties.
Hence, specifying a multimedia series is an abstraction process; the particular abstraction goal determines both the
properties ignored at this stage and the selection of the “right abstraction level”. However, the “functionality” of
multimedia applications pertain to their conceptual and presentational structure and behavior: real-world objects
and relationships involved, spatio-temporal structure and synchronization during presentation, control flow and
behavior in various events etc.
For instance, if temporal issues of multimedia applications is of particular
importance and should be specified in detail, the abstraction process may ignore details such as the multimedia
content layout and text formatting and may keep only the relevant to the temporal dimension properties.
The decision on the “right abstraction level” in the MULTIS example is guided mainly by the desired diversity -or
similarity degree- of produced multimedia applications and is a trade-off between the complexity of a MULTIS
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
4
COMPUTER TECHNOLOGY INSTITUTE
1999
system and the range and diversity of multimedia applications that the system is able to produce -high abstraction
level leads to too open MULTIS systems which tend to resemble general purpose authoring tools. For instance, if
the temporal properties of a multimedia scene are specified using exact time instances and time distances from a
specific time point -e.g. a video starts at 5 from the beginning of the presentation and at the 3 of its playout an
image appears for 5 -, then “valid” scenes are considered only those that conform to these strict temporal
constraints; hence, there is no diversity of scenes in terms of their temporal synchronization.
At a higher
abstraction level, similar specifications could be given using temporal relations instead of time instances allowing
several multimedia scenes to “fall under” these temporal constraints e.g. “a video starts some time after the
beginning of the scene while two images appear sequentially and in parallel with the video”.
2.2.2
The Model Title Specification as an Semantic Abstract Description of a set of multimedia applications
In the MULTIS approach the Model Title Specification captures the following:
a.
The conceptual structure of multimedia applications (application layer), which consists of the “real-world”
objects of the application domain (objects that exist in the real, outside world –we call them conceptual
structure objects or conceptual units), their attributes and relationships. For instance, in a multimedia series of
tourist guides, a city, a hotel, or a museum are conceptual structure objects; a relationship could be “each city
has one or more hotels”.
b.
The presentational structure of multimedia applications (presentation layer), which consists of the
presentational objects (objects that appear “on the screen” during the execution of the multimedia application we call them presentational units), their attributes and relationships. The data types of multimedia content
(called media objects) that each real-world object “is presented by” are the basic objects for building
presentational units. The presentational structure reflects how the conceptual structure is mapped onto a
multimedia application. For instance, if the conceptual structure includes that “a company consists of a
number of departments”, one possible presentational structure is “a company is presented by an screen whose
background is an organizational chart with a number of departments; each department is presented by one
introductory screen, accessible through active hotspots on the organizational chart”. Note that the same
conceptual structure can be mapped onto many different presentational structures depending on the way the
multimedia content of the conceptual objects are assembled and structured in a multimedia application.
2.2.3
On the Presentational Structure of Multimedia Applications
To define the properties of multimedia applications captured by presentational structure we first define the concept
of the Presentational Unit (PU).
Definition 2.1: An elementary Presentational Unit (PU) is a triplet pu  ( m, , p ) where

m is a media object,
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
5
COMPUTER TECHNOLOGY INSTITUTE

1999
τ is a time interval, called the “presentational duration” of pu (possible indefinite e.g. for a web page that
stays on the screen until an outside event occurs),

p is a region on the screen, called the “presentational position” of pu (possibly indefinite e.g. for the object of
type “audio”)
The “presentational duration” of an elementary PU pu, denoted with pu.τ, represents the temporal interval when
the pu is active during an execution of a multimedia application (e.g. it appears in a presentation). A temporal
interval is defined by two end points or time instances [3], [34].
The “presentational position” of an elementary pu, denoted with pu.p, represents the screen portion the pu
occupies during an execution of a multimedia application; the domain of presentational position is the set of 2D
polygons [13].
Definition 2.2: A composite PU is defined inductively by combining PUs in three orthogonal dimensions or views:
Logically, Temporally and Spatially. The presentational duration of a composite PU is a set of temporal intervals
representing the presentational durations of its constituents PUs. The presentational position of a composite PU is
a set of screen regions representing the presentational positions of its constituents PUs.
The following properties are captured by the Presentational Structure of a PU :
i.
The types of the constituent PUs (media objects and composite PUs) disregarding the specific content i.e.
two pictures (PUs) have the same type.
ii.
The logical structure of the PU (including constituent PUs), disregarding specific instances i.e. two slide
shows, one of 10 slides and the other of 25 have the same logical structure.
iii.
The temporal synchronization of the PU (including constituent PUs), disregarding the specific durations
and considering only the qualitative temporal information that is considered significant and relevant to the
specific abstraction goal. i.e. two pieces of synchronized audio-video (PUs), one of 5 and the other of 10
have the same temporal synchronization. We will refer to this as the “Temporal Structure” of a PU.
iv.
The spatial synchronization -on the screen relative positioning- of the PU (including constituent PUs)
disregarding i.e. specific sizes and taking into account only qualitative spatial information that is considered
significant and relevant to the specific goal i.e. two pairs of non-overlapping photos (PUs) have the same
spatial synchronization. We will refer to this as the “Spatial Structure” of a PU. Note that the spatial
synchronization of a PU is meaningful for visual PUs e.g. sub-scenes, web-pages, PUs of type image or
video.
v.
The interactive behavior of the PU (including constituent PUs), disregarding specific events, conditions and
actions and considering only types (classes) of the above i.e. two buttons, one activated by the “mouse
click” event and the other by the “mouse over” event have the same interactive behavior.
Dropping or under-specifying one of the axes (logical, temporal, spatial) creates a Presentational View.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
6
COMPUTER TECHNOLOGY INSTITUTE
1999
Hence, a PU is a structured composition of simpler PUs which is semantically meaningful under a Presentational
View; depending on the Presentational View, a PU can be characterized as a Logical, Temporal or Spatial PU.
Examples: a) Let the specification of a multimedia application for company presentations include that “the
application starts with the company introductory video and while the video plays, various images appear on the
screen; when the video finishes, an image of a company organization chart appears”. The video duration –a
temporal interval- can be considered to define a PU whose “meaning” is “introduction”;
the duration that the
company organization chart stays on the screen defines another PU. The Macromedia Director [35] authoring
paradigm is based on temporal PUs: a time frame (a temporal interval) in the score window, or a set of such frames
can be considered a temporal PU. A temporal PU is defined by a temporal interval within which its constituent PUs
appear b) A space-oriented specification of the above PUs may include: “a company introductory scene contains a
video and a slide-show of images; a second scene contains an organization chart”. The two scenes are two spatial
PUs. A web page is another example of a spatial PU. c) A structured web document consisting of a header, one or
more author names and a set of paragraphs is an example of a logical PU.
Definition 2.4: The Presentational Structure of a multimedia application consists of the set of its constituent PUs
and their relationships during presentation; the relationships among PUs determine the control flow of the
application. In many cases, a multimedia application is a single PU.
Examples: a) In the “company presentation” example given above, the introductory screen and the department
screens are PUs linked together via active hotspots on the organizational chart; the relationship can be
characterized as link-oriented relationship between PUs. Link-oriented relationships used also in web-based
applications that consist of a number of hyper-linked spatial PUs (web pages). b) A multimedia presentation that
“plays” in automatic mode is considered a set of PUs with time-oriented relationships.
2.3 Semantic Multimedia Abstractions (SMAs) and the SMA model
Definition 2.5: A Semantic Multimedia Abstraction (SMA) is a qualitative abstract description of a multimedia
application in terms of the properties captured by the conceptual and presentational structure of multimedia
applications (defined in Sect. 2.2.2 and 2.2.3); we call such properties conceptual and presentational properties of
multimedia applications at the semantic level.
A number of models for multimedia information management that address certain aspects of multimedia
applications have been developed. Most of them emphasizes on individual media -images and video- following
various modeling approaches i.e. the knowledge-based semantic image model, proposed in [10] a four-layered
model that uses the hierarchical structure TAH (Type Abstraction Hierarchy) for approximate query answering by
image feature and content. Models for multimedia documents address the issues of spatio-temporal
synchronization and of structuring of multimedia documents: Time intervals and Allen’s temporal relationships [3]
and 2D spatial relationships [13] are extensively used for modeling the spatio-temporal synchronization of
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
7
COMPUTER TECHNOLOGY INSTITUTE
1999
monomedia data and of more complex multimedia structures and for representing temporal and spatial semantics
e.g. [34], [10], [11], [54]. Language-based models such as SGML[49], XML[14] and object-oriented approaches
[29], [48] have also been developed for modeling multimedia documents.
For the representation of SMAs we need a “model” based on well established concepts and techniques, able to
capture in a uniform way the conceptual and presentational properties of multimedia applications at the semantic
level. Al-Khatib et al. in [2] review, categorize and compare recent semantic data models for multimedia data at
different levels of granularity. According to this categorization, the model for representing SMAs should include
features from both compositional and organizational models for multimedia documents while it should emphasize
on multimedia databases and provide abstraction constructs for representing higher level structures. Using for
instance a graph-based model for multimedia applications -where nodes represent multimedia objects (simple or
composite) and arcs denote the execution flow, a model for SMAs could be created by mapping a detailed graph
representing one multimedia application –one instance- onto a less detailed, generic graph whose nodes and arcs
represent classes of objects and relationships of the initial graph.
The proposed model –called “SMA model”- is based on well established semantic data models used in database
conceptual modeling and knowledge representation [8], [9], [42]. These models provide structural concepts such as
entities (objects), relationships, attributes as well as forms of data abstraction for relating concepts: classification
(grouping objects that share common characteristics into a class), aggregation (treating a collection of component
concepts as a single concept), generalization (extracting from a set of category concepts a more general concept
and suppress the detailed differences) and association (treating a collection of similar member concepts as a single
set concept) [8], [9], [42]. Our building tools will be these classic forms of abstraction, extended to the temporal
and spatial dimensions to capture the presentational properties of multimedia applications.
2.3.1
Representing conceptual properties of multimedia applications at the semantic level (SMA’s conceptual
structure)
For the representation of the conceptual structure of SMAs, we shall use the provided structural concepts (entities,
attributes and relationships) and forms of abstraction (classification, generalization, aggregation, association) of
semantic data models.
2.3.2
Representing presentational properties of multimedia applications (SMA’s presentational structure)
Semantic Data Models use two abstraction constructs to allow the recursive formation of complex objects from
simpler ones: aggregation and grouping. For SMA’s presentational structure such models should encompass the
standard notion of “consists” for representing the logical structure of PUs (e.g. the PU “map” consists of an image
and a number of buttons), a “temporal consists” for representing the temporal structure of PUs (e.g. a PU “slide
show” consists of a temporal sequence of slides) and a “spatial consists” for representing the spatial structure of
PUs (e.g a PU “scene” consists of two disjoint pictures and a text on the bottom of the screen). By extending these
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
8
COMPUTER TECHNOLOGY INSTITUTE
1999
abstraction constructs to the temporal and spatial dimensions, temporal and spatial aggregation and grouping are
defined.
2.3.2.1 Representing Temporal Structure of multimedia applications at the semantic level (SMA’s temporal
structure)
a. Temporal Abstraction Constructs
Let U  { pui | pui PU }, 1  i  n} and pui . be the presentational duration of pu i for i  1...n . Let also
Rt ( pu. , pu . ) denote a temporal relationship R (such as “before” [3]) between pairs of presentational durations.
Temporal Aggregation
Aggregation is “the form of abstraction in which a relationship between component objects is considered as a
higher level aggregate object. Every instance of an aggregate object class can be decomposed into instances of the
component object classes, which establishes a part-of relationship between objects” [9]. E.g. a car is an aggregate
of its parts.
Temporal Aggregation is the form of abstraction in which a collection of PUs pu i , i  1...n with presentational
durations pui . form a higher level PU pu whose presentational duration pu. “temporally consists of” the
presentational durations pui . of its constituents (see Example 2.1). The higher level presentational unit pu is
called a temporal aggregate of pu i and its presentational duration pu. equals the union of the presentational
duration pui . of its constituents.
The important features of a Temporal aggregate are: a) it is also a PU with a presentational duration attribute, b)
the presentational durations of its constituent PUs are “within” (or play during) the presentational duration of the
temporal aggregate, c) the presentational duration of the aggregate PU pu. does not extend before the start time
or after the end time of any of its constituent PUs, d) pu. is a single temporal interval without “temporal” holes.
Definition 2.6: Let pu  A( pu1 , pu2 ,.... pun ) be an aggregation of component PUs pu1 , pu2 ,.... pun . Then pu is a
temporal aggregation of pu1 , pu2 ,.... pun noted AT , with presentational duration AT . if and only if:
n
AT ( pu )  pui  A, During ( pui . , pu. ) and Equals ( pu. ,  pui . )
(During
and
Equals
are
temporal
i 1
relationships [3] – see Figure 2).
Example 2.1: If a scene consists of three PUs of type image, video and audio, with presentational durations
τscene
τaudio
τvideo
τimage
pu I . , puV . and pu A .
Scene  AT ( pu I , puV , pu A ) 
respectively,
then:
During ( pu I . , scene. )  During ( puV . , scene. )  During ( pu A . , scene. )
 Equals ( scene. , pu I .  puV .  pu A . )
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
9
COMPUTER TECHNOLOGY INSTITUTE
1999
Examples of Temporal aggregations in common authoring paradigms
i. Macromedia Director [35] paradigm: Here, a PU (e.g. a multimedia scene) is typically specified as a sequence
of time frames each consisting of several elementary PUs i.e. media objects with presentational duration and
presentational position; such PU is a temporal aggregate of its constituent PUs with presentational duration equal
to the temporal interval defined by the set of time frames if and only if all the constituent media objects “play
within” this temporal interval and for each time frame there is at least one active media object (there are no
“temporal” holes). In case a PU e.g. a background music, continues to play to the succeeding time frames, the PU
is not a temporal aggregate. In this paradigm a multimedia application is typically a set of inter-linked temporal
aggregations.
ii. HTML / web authoring paradigm: a PU is usually determined by the media objects in a web page; such PU can
be characterized as a temporal aggregate if none of its constituent media objects plays outside -e.g. extends to the
previous or the next page- the temporal interval when the web page is active .
Temporal Grouping
Grouping or Association “is a form of abstraction in which a relationship between member objects is considered
as a higher level set object. An instance of a set object class can be decomposed into a set of instances of the
member object classes and this establishes a member-of relationship between a member object and a set object”
[9].
Temporal-grouping is a form of abstraction in which a collection (group) of similar PUs pu i , i  1...n (i.e. PUs
with the same presentational structure) with presentational durations pui . , temporally related with the same -or
similar- temporal relationship R t , form a higher level PU pu whose presentational duration pui . “is a temporal
group of” the member presentational durations pui . . (see example 2.2). The higher level PU pu is called a
temporal group of pui with temporal relation Rt and has presentational duration pu.τ equals to the minimal cover of
the pui . of PUs. The R t is a temporal constraint on the set.
A group of PUs, temporally related with “similar” temporal relationships can be considered a temporal grouping if
a more generic temporal relationship is used instead. For instance, a group of PUs with temporal relationship either
“meets” or “before” can be considered a temporal grouping where the temporal relationship “sequential” holds
for all members of the temporal grouping. A temporal grouping without temporal constraints emphasizes the
similarity of temporal relationships of the set and ignores the exact relationship (abstraction transformations on
abstraction constructs and temporal relations are discussed in section 3.4).
Definition 2.7: Let pu  G( pu1 , pu2 ,.... pun ) be a grouping of similar PUs pui , i  1n and Rt be a temporal
relationship. Then pu is a temporal grouping of pu i , noted GT , with presentational duration GT . if and only if
Rt holds between all pairs ( pui . , pui 1 . ) : GT ( pu)  pui , pui 1  G, Rt ( pui . , pui 1 . ) .
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
10
COMPUTER TECHNOLOGY INSTITUTE
1999
Example 2.2: A sequence of slides where the temporal relationship overlaps
τslide show
holds between every pair of slides is a temporal grouping if:
SlideShow  GT ( slide1 , slide 2 ...slide n ){Overlaps} 
slide i , slide i 1 , Overlaps ( slide i . , slide11 . )
SlideShow.  [ slide1 . .start, slide n . .end ]
τslide1
τslide2
.....
τslide n
Notice that, for this to be presentationally maningful, suitable spatial
relationships must hold between successive slides.
b. Representing Temporal Relations of multimedia applications at the semantic level
Temporal
Aggregation
and
TEMPORAL CONSTRAINTS
parallel
sequential
Grouping defined above are
is-a
two
abstraction
constructs
certain
temporal
is-a
overlaps
before
during
starts
finishes
meets
equal
posing
constraints. However, temporal
synchronization information of
Figure 2: A hierarchy of temporal constraints
PUs refers also to temporal
constraints on such abstraction constructs, within the constituent PUs of a PU and among PUs. The 13 temporal
relationships of Allen’s between time intervals [3], namely before, meets, overlaps, during, starts, finishes, and
equals and their reverse relationships form the basic “vocabulary” for temporal constraints at the semantic level.
The
SMA
model
handles
these
relationships
and
their
combinations
utilizing
the
operators
, , , ,,, ,  (e.g. before  meets ) as temporal integrity constraints. Quantitative values of presentational
durations (such as concrete start/end time instances of presentational durations as well as lengths of presentational
durations e.g. actual duration of a video) are ignored and abstracted to the corresponding qualitative information. A
generalization hierarchy of temporal relations allows a variable precision at this level (e.g. the hierarchy in Figure
2) e.g. less information is given by limiting the set of temporal relationships to sequential and parallel, while more
information is provided if qualitative distances (near, far etc) are captured as well [11].
2.3.2.2 Representing Spatial Structure of multimedia applications at the semantic level (SMA’s spatial structure)
a. Spatial Abstraction Constructs
In [52] it is noted that when dealing with spatial objects, i.e. those whose position in space matters to the
information system, it is often the case that if objects A, B and C constitute object X , then the position of
A, B and C form a subset of the position of X . Thus spatial aggregation and spatial grouping were introduced as
simple extensions to modeling primitives for conveying this extra piece of information. In [37] we identified the
particular interpretation of spatial aggregation and grouping in the multimedia domain and defined the
corresponding abstraction constructs.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
11
COMPUTER TECHNOLOGY INSTITUTE
1999
Let U  { pui | pui PU }, 1  i  n} and pui . p be the presentational position of pu i for i  1...n . Let also
Rs ( pu. p, pu . p ) denote a spatial relationship R (such as “disjoint” [13]) between pairs of presentational positions.
Spatial Aggregation
Spatial Aggregation is the form of abstraction in which a collection of PUs pu i , i  1...n with presentational
positions pui . p form a higher level PU pu whose presentational position pu. p “spatially consists of” the
presentational positions pui . p of its constituents (see Example 2.3). The higher level presentational unit pu is
called a spatial aggregate of pu i and its presentational position pu. p equals the union of the pui . p of its
constituents.
The important features of a Spatial aggregate are: a) it is also a PU with a presentational position attribute, b) the
presentational positions of the constituent PUs are “within” (or appear inside) the presentational position of the
spatial aggregate, c) the presentational position of the aggregate PU pu. p does not extend the space limits of any
of the constituent PUs, d) pu. p is a region without “spatial” holes.
Definition 2.8: Let pu  A( pu1 , pu2 ,.... pun ) be an aggregation of component PUs pu1 , pu2 ,.... pun Then pu is a
Spatial Aggregation of pu1 , pu2 ,.... pun noted AS , with presentational position AS . p if and only if:
n
AS ( pu )  pui  A, Covers ( pui . p, pu. p ) and Equal ( pu. p,  pui . p )
(Covers
and
Equal
are
spatial
i 1
relationships [13], see Figure 3).
Example 2.3: If a multimedia scene consists of three visual PUs of type image, video and text with presentational
positions pu I . p, puV . p and puT . p respectively, then:
Scene  AS ( pu I , puV , puT ) 
scene
Covers( pu I . p, scene. p )  Covers( puV . p, scene. p )  Covers( puT . p, scene. p )
text
video
 Equal ( scene. p, pu I . p  puV . p  puT . p )
text
background image
Examples of Spatial aggregations in common authoring paradigms
i. Macromedia Director [35] paradigm: Within a temporal interval, the visual PUs (e.g. visual media objects) that
appear simultaneously on a screen portion form a spatial aggregate; hence, each time frame in the score window
defines a spatial aggregate of the PUs exist in the score channels.
ii. HTML / web authoring paradigm: a web page forms a spatial aggregate of its constituents PUs.
Spatial Grouping
Spatial-grouping is a form of abstraction in which a collection (group) of similar PUs pu i , i  1...n with
presentational positions pui . p , spatially related with the same spatial relationship Rs , form a higher level PU pu
whose presentational position pu. p “is a spatial group of” the member presentational positions pui . p . (see
example 2.4). The higher level PU pu
is called a spatial group of pui with spatial relation Rs and has
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
12
COMPUTER TECHNOLOGY INSTITUTE
1999
presentational position pu. p equals to the minimal cover of the pui . p of PUs. The Rs is a spatial constraint on the
set.
Definition 2.9: Let pu  G( pu1 , pu2 ,.... pun ) be a grouping of similar PUs pui , i  1n and Rs be a spatial
relationship. Then pu is a spatial grouping of pu i , noted G S , with presentational position GS . p if and only if
Rs holds between all pairs ( pui . p, pui 1 . p) : GS ( pu)  pui , pui 1  G, Rs ( pui . p, pui 1 . p) .
Example 2.4: A group of buttons where the spatial relationship meets holds between every pair of successive
buttons is considered a spatial grouping if :
button1 button2 ... button n
ButtonBar  G S (button1 , button2 ...buttonn ){Meets} 
buttoni , buttoni 1 , Meets (buttoni . p, button11 . p )
ButtonBar. p  [button1 . p.( x1 , y1 ), buttonn . p.( x n , y n )]
c.
d. Representing Spatial Relations of multimedia applications at the semantic level
Similarly
SPATIAL CONSTRAINTS &
GENERALISATION HIERARCHY
Temporal
to
the
Dimension,
general overlap
w ithin
boundary_overlap
is-a
y
is-a is-a
is-a
is-a
is-a
meets(x,y) overlaps(x,y) inside(x,y) covered by (x, y)
disjoint(x,y)
x
is-a
the
is-a
is-a
x
y
x
y
y
x
y
is-a
equal
covers
x
x
is-a
sixteen
2D
Topological
relations
[13],
disjoint,
namely
meets, overlaps, inside,
y
covered_by, covers and
is-a
is-a
boundary_disjoint
is-a
is-a
is-a
is-a
equal form the basic
boundary_meets
“vocabulary” for spatial
Figure 3 : A hierarchy of 2D spatial constraints
constraints
of
multimedia applications
at the semantic level. A generalization hierarchy of 2D topological relations (e.g. Figure 3) allows a variable
precision at this level e.g. less information is given by limiting the set of topological relationships to general
overlap and disjoint, while more information is provided if qualitative distances (near, far etc) are captured as well
[11].
2.3.2.3 Representing multimedia content at the semantic level
In conceptual modeling, Classification, a form of abstraction in which a collection of objects is considered a higher
level object class, is used to classify and describe objects in terms of object classes; hence, it is natural in the SMA
model to represent media objects by their corresponding classes (data types of multimedia content). Specific
properties of media objects are ignored at this stage. Abstraction hierarchies of multimedia data classes allow a
variable precision at this level. For instance, the content data type SELECTOR is a generalization of data types
MENU, EVENTER and BUTTON.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
13
COMPUTER TECHNOLOGY INSTITUTE
2.4 The
1999
SMA model graphical notation (extended-OMT model notation) and the corresponding SMA
Definition and Query Language (SMA-L)
2.4.1
The Extended-OMT model graphical notation
The abstraction constructs proposed for representing SMA’s presentational structure are generic and can be used
with any semantic model which has the minimal functionality of allowing the construction of complex objects
from simpler ones. We illustrate this with the Object Modeling Technique (OMT)[46], resulting in an ExtendedOMT model.
Extensions to OMT-Aggregation construct
Extensions to OMT-Association construct
T{<Temporal Constraints>}
T{<Temporal Constraints>}
Temporal Aggregation
S{<Spatial Constraints>}
Spatial Aggregation
Class
Temporal Grouping
S{<Spatial Constraints>}
Class
Spatial Grouping
Figure 4 : Extensions to OMT Object Model graphical notation
2.4.2
Semantic Multimedia Abstraction (SMA) Definition and Query Language (SMA-L)
For the representation and manipulation of SMAs, the Semantic Multimedia Abstraction Definition and Query
language (SMA-L) has been defined, the formal syntax of which is given in BNF format (Table 1). The SMA-L
was built on the Extended-OMT model and thus any SMA modeled using the extended-OMT can be represented
with SMA-L. SMA-L is a declarative object-oriented language which:

allows the representation of the conceptual and presentational structure of SMAs
(c_units and p_units
represent conceptual and presentational units respectively).

contains predicates corresponding to the temporal and spatial abstraction constructs (aggregation and
grouping) defined for SMAs for forming PUs as well as to the Presentational View of PUs: logical, temporal
and spatial, allowing users to formulate queries on complex structures of multimedia applications.

provides a way for defining PUs and consequently SMAs in various abstraction levels in terms of their
conceptual and presentational properties at the semantic level, through abstraction hierarchies on abstraction
constructs, constraints and multimedia data types.
2.4.2.1 Syntax of SMA-L
The BNF notation of the SMA-L syntax is given in Table 1. Words in <italics> denote non-terminal elements of
the language. Clauses in [ ] are optional arguments. Bold is used to denote reserved words.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
14
COMPUTER TECHNOLOGY INSTITUTE
1999
Table 1: Semantic Multimedia Abstractions Definition and Query Language (SMA-L)
<SMA>
<unit>
<c_unit>
:
:
:
<unit_name>
<c_unit_types>
<c_unit_type>
<simple_c_unit>
<composite c_unit>
<p_unit>
:
:
:
:
:
:
<p_unit_types>
<p_unit type>
<simple p_unit>
:
:
:
<content data type >
types
:
<composite p_unit>
<abstraction construct >
:
:
<abstraction view>
<temporal>
<spatial>
<logical>
<p_unit_list>
<member_unit>
<component_unit_list>
<category_unit_list>
< source_p_unit_list>
:
:
:
:
:
:
:
:
:
< target_p_unit_list>
:
<unit reference>
<c_unit reference>
<p_unit reference>
<constraint>
:
:
:
:
<condition>
:
<action>
<statement>
<temporal constraint>
:
:
:
<temporal relation>
:
< spatial constraint>
:
< spatial relation>
:
<query definition>
:
< match statement>
:
unit | unit <unit >
; An SMA is a sequence of conceptual and/or presentational units
<c_unit> | <p_unit>
C_UNIT <unit_name>
; conceptual unit
[TYPE <c_unit_types>]
[PRESENTED_BY <p_unit_list>]
identifier
<c_unit_type> | <c_unit_type>, <c_unit_type>
<simple_c_unit> | <composite_c_unit>
identifier | ABSTRACT | LINK (<source_ unit_list>) (<target_unit_list>)
< abstraction construct > [<{constraint}>]
P_UNIT<unit name>
; presentational unit
[TYPE <p_unit types >]
<p_unit_type> | <p_unit_type>, <p_unit_type>
<simple_p_unit> | <composite_p_unit>
<content data type > | ABSTRACT
| LINK (<source_p_ unit_list>) (<target_p_unit_list>)
CONTENT | MULTIPLEXED_CONTENT
; content data types can be extended to new
| COMPOSITE | VISUAL_OBJECT | INPUT
| OUTPUT | IMAGE | VIDEO | AUDIO | ANIMATION | TEXT | GRAPHICS
| PICKER | HOTSPOT | SELECTABLE_CONTENT | STRING | VALUATOR
| SELECTOR | MENU | EVENTER | BUTTON
| SLIDE_SHOW | INTERACTIVE_IMAGE
< abstraction construct > [ : <abstraction view> ] [<{constraint}>]
GROUP_OF (<member_unit>)
| AGGREGATION OF (<component_unit_list>)
| GENERIC (<category_unit_list>)
[<temporal>] [<spatial>] [<logical>]
; the “view” of presentational units
[T] [<{ temporal constraint }>]
; temporal view
[S] [<{ spatial constraint }>]
; spatial view
[<{constraint}>]
; logical view
<p_unit reference> | <p_unit_list>, <p_unit reference>
< unit reference>
<unit reference> | <component_unit_list>, <unit reference>
<unit reference> | <category_unit_list>, <unit reference>
<p_unit reference> [ : <condition>]
| < source_ p_unit_list>, < p_unit reference> [ : <condition>]
< p_unit reference> [ : <action>]
| < target_p_unit_list>, < p_unit reference>[ : <action>]
<c_unit reference> |<p_unit reference>
< unit_name> | < c_unit type>
< unit_name> | < p_unit type>
<statement> | not < constraint>
| < constraint> and < constraint>
| < constraint> or < constraint> | (<constraint>)
<statement> | not < condition >
| < condition > and < condition >
| < condition > or < condition > | (<condition >)
<statement>
string | function
<temporal relation> | not <temporal relation>
| <temporal constraint> and <temporal constraint>
| <temporal constraint> or <temporal constraint>
| (<temporal constraint>)
meets | met-by | before | after | during | contains | overlaps | overlapped-by
| starts | started-by | finishes | finished-by | equal
| sequential | parallel
< spatial relation> | not < spatial relation>
| < spatial constraint> and < spatial constraint>
| < spatial constraint> or < spatial constraint>
| (<spatial constraint>)
disjoint | meet | overlap | covered_by | covers | inside | contains | equal
| g_overlap | within | b_disjoint | b_meets | b_overlap
SELECT < semantic mm abstraction name>
<match statement>
MATCH (<semantic mm abstraction>)
| < match statement> and < match statement>
| < match statement> or < match statement>
< semantic mm abstraction name>:
identifier
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
15
COMPUTER TECHNOLOGY INSTITUTE
1999
2.5 Related Work
The M Model by Dionisio and Cardenas [12] and the ZYX model by Boll and Klas [7] follow a modeling approach
similar to the SMA model. The M Model is a synthesis of the Extended ER and Object-Oriented data models,
integrating spatial and temporal semantics with general database constructs; the basic construct introduced is the
stream”, an ordered finite sequence of entities or values; substream and multistream (an aggregation of streams that
creates new more complex streams) are the other two basic constructs of the model. In the SMA model, streams,
substreams and multistreams are modeled with temporal groupings and temporal aggregations, which are generic
extensions of the classic aggregation and grouping constructs. However, M Model and its MQuey language can
also be used for modeling and querying SMAs. The ZYX model introduces new constructs for multimedia
document modeling; the model uses a hierarchical organization for the document structure, an extension of Allen’s
model (which support intervals with unknown duration) for modeling temporal synchronization and a point-based
description for modeling spatial layout. The ZYX model is a tree-based model where nodes represent “presentation
elements” -a concept similar to our notion of presentational unit- and each node has a binding point that connects
it to other elements. Spatio-temporal synchronization and interactivity are modeled with temporal, spatial or
interaction elements (par, seq, loop, temporal-p and spatial-p, link, menu etc.); in the SMA model such
relationships are modeled as constraints on presentational units (including temporal/spatial aggregation and
grouping) which allow the modeling of the conceptual and presentational structure in a uniform way. As the Z YX
model uses a structure similar to language-based models (XML, SMIL) the abstraction process from these models
to ZYX is straightforward and abstraction transformations (discussed in 3.4) can transform ZYX representations to
SMA-L ones.
Conceptual modeling has been proposed for document information retrieval in [39], where principles from
database area are used in order to enhance retrieval of multimedia documents; the model focuses on multimedia
documents and is restricted on their logical, layout and conceptual view. Other object-oriented multimedia query
languages with the appropriate extensions/modifications can be used for the same purpose as SMA-L such as: the
Multimedia Query Specification Language along with the object-oriented data model for multimedia databases
proposed by Hirzalla et al. [19] which allows the description of multimedia segments to be retrieved from a
database containing information on media and on spatial and temporal relationships between these media; the
Query language of the TIGUKAT object management system [43];
the general purpose multimedia query
language MOQL [33] which includes constructs to capture the temporal and spatial relations in multimedia data.
2.6 Application: Validation of the MULTIS production approach
Figure 5 depicts a part of the conceptual database schema of the MULTIS system for a series of Point of
Information Systems (POIs), modeled using the extended-OMT model. The corresponding statements in SMA-L
are also given.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
16
COMPUTER TECHNOLOGY INSTITUTE
1999
POIS
Interactive M ap
Geographic Area
HotSpot
Place
Image
Area View
Landmark
M useum
Castle
Hotel
S{meets}
sub-scene
T{meets},
S{equal}
Image
Video
T {equals}, S{disjoint}
Text
ButtonList
C_UNIT POIS
TYPE AGGREGATION_OF (GeographicArea)
C_UNIT GeographicArea
TYPE
AGGREGATION_OF
(
GROUP_OF
(GeographicArea)),
AGGREGATION_OF(GROUP_OF(Place),
GROUP_OF(AreaView) )
…………..
C_UNIT Landmark
TYPE GENERIC(Museum, …. , Hotel)
……………….
C_UNIT Hotel
PRESENTED_BY(AGGREGATION_OF (SubScene,
Video))
P_UNIT SubScene
TYPE AGGREGATION_OF (
GROUP_OF (IMAGE): T {meets}, S{equal},
TEXT,
ButtonList) : T {equal}, S {disjoint}
P_UNIT ButtonList
TYPE GROUP_OF (Button): T{equal}, S{meets}
T{equals} S{meets}
Button
Figure 5 : Extended-OMT model of MULTIS POIS
3.
Semantic Multimedia Abstractions for Querying Large Multimedia Repositories
3.1 The opportunity of abstraction in multimedia information retrieval
Organized units of interactive multimedia material are becoming rapidly available beyond their original format,
namely Compact Disks; the advent of the Web and the appearance of digital libraries enlarge the habitat of such
multimedia units which can now reside anywhere on the Internet, be distributed across local or global networks, or
even have a transient and virtual existence: a net-surfing session on the Web is a multimedia application of this
kind. Although such collections of applications are not organized as proper databases, they are very large
repositories of multimedia information. For large collections of such applications, browsers and query mechanisms
addressing the multimedia data alone while reasonably well developed are inadequate: we lack techniques for
efficient generic retrieval of structured multimedia information. To really tap the information resource we need a
different approach for querying and navigating in these repositories, one that would resemble our own way of
recalling information from our minds, human remembering [27].
Consider the following query: find multimedia electronic books explaining grammar phenomena of English
Language where phrasal verbs are explained through a page of a synchronized video and a piece of text in two
languages; the video covers half of the screen and when clicked a translation text appears. This is an abstract
specification of –possibly a part of- a multimedia application and regard its conceptual structure (a book has pages
with phrasal verbs), its presentational structure including its spatio-temporal synchronization and its interactivity. It
is exactly with respect to these characteristics that we would like to be able to query and navigate through a
multimedia repository.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
17
COMPUTER TECHNOLOGY INSTITUTE
3.1.1
1999
Principles of Human Remembering and the Need for mixed granularity levels
According to cognitive science, we think and remember using abstractions [27]; we build abstraction models of
varying granularity that depend on the task at hand as well as on the state of our knowledge in a domain.
Moreover, the process of changing representation levels and the multilevel representation of knowledge are
fundamental in common sense reasoning [44]. Starting at a high abstraction level -coarse granularity- and moving
towards a more detailed one -fine granularity- is a common approach in solving a problem [40]. The technique
used in AI to imitate this process is the use of hierarchies of abstraction models, each one in a different abstraction
level, and the definition of the relations between the different models in a hierarchy [20], [47], [25].
However, when we think and recall information in our minds, we normally mix granularity levels in a single
representation. For instance, recalling a place visited, a description may include “an island, where in the harbor
exists a castle and a church of 16th century and there is a small village named “Sigri”. To answer such query using
maps, -a mature form of symbolic representation for complex information- we would need a multi-resolution map
with only names for some large cities but including details such as street names and museums for others.
Consequently, to allow information retrieval congruent with human remembering we need techniques that support
multilevel knowledge representation using various abstraction levels and mixed abstraction levels or resolutions at
the same representation (considered either granularity levels at the same abstraction level or hierarchies of different
abstraction levels).
In multimedia applications a number of factors affect the choice of abstraction level in both the design and the
retrieval of applications while an abstract representation may be "more detailed" for one part of it and “more
abstract” for another. When specifying MULTIS systems [53] we identified the following:

the user view: conceptual, logical, temporal, spatial, interactivity or content. E.g. if the temporal
synchronization matters most, the abstraction level of the other dimensions is kept high.

the user’s knowledge and recollection of the multimedia application in any presentational view. E.g. when
looking for an application with a slide show, one might or might not remember –or deem important- the slide
synchronization.

the tightness of a constraint in each dimension, implied from the significance of the behavior under
consideration

the temporal scope, determined by the temporal interval over which the behavior the application is analyzed.
E.g. is one interested in the behavior over the whole application or over a few seconds of it? If a query focuses
on the temporal interval of a slideshow in an application, the slide synchronization may be considered
important and specified in detail.

the spatial scope, determined by the area of the screen over which the behavior is analyzed. E.g. is one
interested in the behavior over the whole scene or over a small part of the scene?
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
18
COMPUTER TECHNOLOGY INSTITUTE

1999
the temporal and spatial grain size: the degree of temporal and spatial precision required in the answer e.g. is
one interested in the exact temporal or spatial synchronization between presentational units of a scene or just
in their temporal and spatial relationships?

the content grain size: the degree of precision for content types required in the answer e.g. image or just a
visual object?

the grain size of the presentational structure
3.1.2
Reducing complexity in search
Abstraction is used to decrease complexity in various problems, including search; speeding-up search by using
abstraction techniques is a common and widely studied approach in the field of AI [21]. Research aims to find
methods for creating and using abstract spaces to improve the efficiency of classical searching techniques such as
heuristic search (especially state-space search). The basic idea behind these approaches is that instead of directly
solving a problem in the original search space, the problem is mapped onto and solved in an abstract search space;
then the abstract solution is used to guide the search for a solution in the original space (guided search) [21].
In the multimedia domain reducing the complexity of the search space and hence improving information retrieval
is a “quantitative’ objective for introducing abstraction. A repository of SMAs forms an abstract search space of
which we can have a hierarchy (see Sect. 3.4). An abstract answer to a query can be found by searching in the
hierarchy of abstract spaces in principle a computationally easier task. Then, a method for using the abstract
answer to guide search can be followed, for instance, to use the length of abstract solution as a heuristic estimate of
the distance to the goal [45], or to use the abstract solution as a skeleton in the search process (“refinement
method”[47] or other variants such as path-marking and alternating opportunism [21] ).
3.1.3
Approximate match retrievals - Filtering large multimedia repositories
There are many cases where queries aim to filter out interesting parts of large repositories; in such cases
information retrieval is based on the similarity of the repository’s data to user’s query and approximate matching
techniques are used for query evaluation [26], [1]. In order to filter large multimedia repositories in terms of the
conceptual and presentational properties of multimedia applications, the user should be able to pose approximate
queries and get a set of approximate answers that match in a certain similarity degree the given query. Here, the
similarity measure should also agree to human perception of similarity in multimedia applications.
Abstraction and abstraction hierarchies seem to have a significant role in filtering large multimedia repositories.
An approximate query has the form of an SMA and given an abstract search space of SMAs approximate query
evaluation is performed as a “normal” search process in a simplified abstract search space. Abstraction hierarchies
are used as a basis in query evaluation process and relaxation of queries (see sect. 3.4).
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
19
COMPUTER TECHNOLOGY INSTITUTE
3.1.4
1999
Semantic Multimedia Abstractions and Existing Types of Metadata
One way to query the conceptual and presentational properties of multimedia applications is to capture these
properties by using metadata. The most significant works on metadata for digital media are presented in [30]. In
[6], the metadata used for multimedia documents are classified according to the type of information captured. The
categories include content-descriptive metadata, metadata for the representation of media types, metadata for
document composition; composition-specific metadata are knowledge about the semantics of logical components
of multimedia documents, their role as part of a document and the relationships among these components.
Metadata in SGML [49] and XML[14] documents are organized in document type definitions (DTDs) which are
themselves part of the metadata and contain “element types” of metadata. Additionally, there exist metadata for
collections of multimedia documents (DFR standard) [24]. Statistical metadata and metadata for the logical
structure of documents are expected to optimize query processing on multimedia documents. In [28] a three-level
architecture consisting of the ontology, metadata and data levels is presented to support queries that require
correlation of heterogeneous types of information; in this approach, metadata are information about the data in the
individual databases and can be seen as an extension of the database schema.
Metadata are usually stored either as external -text- files or along with the original information while objectrelational database systems could also be employed to manage them. In [18], Grosky et al. propose content-based
metadata for capturing information about a media object that can be used to infer information regarding its content
and use these metadata to intelligently browse through a collection of media objects; image and video objects are
used as surrogates of real-world objects and metadata are modeled as specific classes being part-of an image/video
class.
SMAs are a type of metadata that capture the conceptual and presentational properties of multimedia applications
at the semantic level. Based on existing types of metadata and extending them (e.g. extending content-based
metadata described in [18] from image and video surrogates of real world objects to PUs), SMAs can be viewed as
metadata on PUs, capturing both semantic information about real world objects that a PU represents and
information about the presentational properties of the PU. Introducing a new class as part-of any PU as a way to
model metadata implies that we should put conceptual and presentational structure as normal class attributes in a
metadatabase scheme which seems rather cumbersome. The approach of DTDs in SGML and XML seems quite
appropriate as it captures structural information of multimedia documents and given that the XML-based
recommendation SMIL[50] would allow to capture information on synchronization of media objects. However,
there is no link between objects –or element types- in an XML DTD and media objects in SMIL documents and
consequently, querying the spatio-temporal synchronization of a presentation of a real-world object requires
correlation between these sources. Finally, for query optimization we would need higher level metadata (statistical
metadata, metadata for collection of applications etc.) [18].
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
20
COMPUTER TECHNOLOGY INSTITUTE
1999
SMAs are modeled as a database schema using extended semantic models and can be stored either in a metadatabase containing SMAs’ representations or along with each PU of multimedia applications, forming a
distributed metadatabase. Thus, queries on the conceptual and presentational structure of multimedia applications
are queries in such a metadatabase. Hierarchies of SMAs allow the definition of higher level metadata (discussed
in detail in section 3.3).
3.2 On the Abstract Multimedia Space
Abstraction has been defined [17] as a mapping between two representations of a problem which preserves certain
properties. The set of concrete, original representations is the Ground space while an Abstract space is a set of
their abstract representations.
Definition 3.1 A “Ground Multimedia Space” is a set of concrete representations of multimedia applications
represented in various models and languages e.g. a set of HTML documents, a set of XML-based documents [14],
a set of Macromedia Director applications [35] form Multimedia Ground Spaces.
Definition 3.2: An Abstract Multimedia Space is a repository of Semantic Multimedia Abstractions (SMAs); we
call this repository Semantic Multimedia Abstractions Database or simply an SMA space.
Without affecting the generality we use the Extended-OMT model / SMA-L for representing the “content” of the
SMA space. Hence, an SMA space is a repository of Extended-OMT object models or sets of statements in SMA-L.
The SMA space is a set of
Ground Multimedia Space: Repository of Multimedia Applications
mm application
mm application
mm application
Database of
M ultimedia
Applications
M ultimedia
Database
Schema
multimedia
application
multimedia
application
HTM L/
XM L
Document
metadata
applications.
SM IL
document
multimedia
application
of
distributed
multimedia
It
is
a
repository
of
SMAs if metadata are stored
generic
specifications
(M TS)
SM A
SM A
along with the multimedia
applications of the ground
SM A
multimedia space. Figure 6
Abstract Multimedia Space (The SMA Space)
Figure 6 :Ground and Abstract Multimedia Spaces
shows
the two different
types
of
existing
repositories:
Multimedia
Repositories (Ground Multimedia Spaces), which contain multimedia applications too loosely organized to be
called a database, and the SMA space (Multimedia Abstract Space) whose instances are SMAs each representing
several multimedia applications.
Note that a number of concrete multimedia applications may correspond to the same SMA (Figure 7). Moreover,
given the hierarchies of abstraction constructs and constraints (defined in section 3.4) a specific multimedia
representation can have many SMAs in various abstraction levels.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
21
COMPUTER TECHNOLOGY INSTITUTE
3.2.1
1999
Querying the Abstract Multimedia Space: Requirements from Query languages
A query on the conceptual and presentational structure of multimedia applications is an SMA and corresponds to
the key value in the query evaluation process. The search space is the SMA Space. The output of the query is the
M ultimedia Application
Semantic M ultimedia Abstraction 1
Semantic M ultimedia Abstraction 2
scene
scene
T{starts}, S{g_overlap}
T{meets}
Image1
3"
5"
1"
T{parallel}, S
T{sequential}
Image1
Image2
Image2
5"
17"
6
5"
3"
6
12"
4"
10"
5"
15"
Figure 7 : Semantic Multimedia Abstractions from a concrete multimedia application
set of SMAs that “match” this key SMA. “Matching” the key SMA means that the retrieved SMAs contain a
fragment which “can be abstracted to the key SMA” if abstraction transformations are applied on it. Hence, the
query language should at least have the expressive power of the language for the representation of the SMA space.
Among the requirements for the query language are:
i.
to support queries on combinations of conceptual and presentational properties of multimedia applications
and to allow queries on complex structures of multimedia applications (by providing predicates
corresponding to abstraction constructs).
ii.
to support queries on temporal and spatial synchronization of multimedia applications at various abstraction
levels.
iii.
to allow mixing abstraction levels in a single query and logical combinations of queries.
iv.
Given that the structure of the Abstract Multimedia Space queried is unknown to the user, the query
language should provide mechanisms for efficient filtering multimedia applications by fuzzy and
incomplete queries, which, together with techniques for partial matching in query evaluation and relaxation
of queries, would improve the efficiency of the query evaluation process.
Based on the proposed SMA model, a query can be defined a) By a set of statements in SMA-L b) Graphically, by
drawing the Extended OMT-model scheme that represents the query or c) Visually, by an approximate description
of the presentational structure of the application.
Towards this visual approach, a software tool has been developed, initially for the purpose of partially specifying
the presentational structure of MULTIS systems (part of the MTS). The tool is customizable by application
domain, it allows the WYSIWYG definition of spatial PUs and it produces the corresponding statements in SMA-L.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
22
COMPUTER TECHNOLOGY INSTITUTE
1999
3.2.1.1 Example Queries in the SMA model / SMA-L
To illustrate how the SMA model and SMA-L meet the above requirements, we give a number of example queries
expressed in SMA-L or with the Extended-OMT model (our SMA model).
i.
Queries on presentational spatial structure of PUs.
Example 3.1:
“Get multimedia applications containing scenes with two photos, where the first is either
covered_by the second or it is
Image1
within it”. The spatial constraint
scene
Image2
a
here can be expressed either by a
OR
combination of two simple spatial
scene

S {within}
S{covered_by inside}
Image1
Image1
Image2
Image1
Image2
Image2
relationships or by a generalized
relationship such as“within”.
ii.
Queries on presentational temporal structure of PUs
Example 3.2: “Get multimedia applications containing a slide-show where each slide is synchronized with a piece
τ1
τ2 τ3
τ4
of audio”
SlideShow
T{meets}
sync slide_audio
....
T{equals}
Image
Audio
SELECT mm_applications
MATCH (P_UNIT SlideShow)
P_UNIT SlideShow
TYPE = GROUP_OF(sync slide_audio): T{meets}
P_UNIT sync slide-audio
TYPE = AGGREGATION_OF (IMAGE,AUDIO):T{equals}
iii. Queries on a combination of conceptual structure, data types and presentational structure of multimedia
applications
Example 3.3: The Extended-OMT model and SMA-L statements of the example query given in Section 3.1 is
English Grammar
Phrasal Verb
Phr_V_page
Video_mode
Text_mode
T{meets}
sync v-t
T{equals}
S{disjoint}
Video
Text
T{meets}
sync t-t
T{equals}
S{disjoint}
SELECT Electronic Books
MATCH (C_UNIT English Grammar)
C_UNIT English Grammar
TYPE = GROUP_OF(Phrasal Verb)
C_UNIT Phrasal Verb
PRESENTED_BY Phr_V_page
PU_UNIT Phr_V_page
TYPE GENERIC(Video_mode, Text_mode)
PU_UNIT Video_mode
TYPE GROUP_OF(sync v-t): T{meets}
P_UNIT sync v-t
TYPE AGGREGATION OF(Video, Text):T{equals}, S{disjoint}
…..
P_UNIT Video
TYPE LINK (Video_mode, Text_mode)
Text
iii. Approximate Queries / Queries with incomplete information on presentational properties of multimedia
applications
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
23
COMPUTER TECHNOLOGY INSTITUTE
1999
Example 3.4: Let an SMA in an SMA space contain a spatial aggregation of the PUs: slide show, text and video.
A query looking for applications
Scene
Scene
containing a scene with an image and a
S{disjoint}
text will not retrieve this SMA,
Image
Text
Video
In the S MA S pace
Image
Text
Query
although it represents a rather similar
scene.
Considering such cases as
queries with incomplete information in
terms of the presentational structure of SMAs, the query evaluation process will search for SMAs that match, in a
certain similarity degree, the given query e.g. if the similarity measure considers a slide show of images similar to
one image and a scene with image and text similar to a scene with image, text and video the SMA will match the
query. In Section 3.4 we define the appropriate abstraction transformations to allow such approximate queries.
Example 3.5: Consider again the query in the Example 3.2. A user query could be:
P_UNIT SlideShow
TYPE = GROUP_OF (AGGREGATION_OF (IMAGE,AUDIO):T{equals}): T{meets}
The PU “sync slide_audio” does not exist in the query although it exists in the SMA as an internal PU added for
modeling purposes (a direct grouping of aggregation constructs seems awkward in OMT). In this case the query
evaluation process returns the above SMA by applying abstraction transformations on the SMA to abstract out the
internal PU (provided the remaining SMA structure matches the user query).
vi. Query relaxation
An overly restrictive query may result in an empty or irrelevant set of SMAs while the SMA space may contain
similar SMAs that the user would like to retrieve. Relaxing the query by abstraction transformations might result in
useful output.
3.3 Models and Techniques for Creating and Searching the Abstract Multimedia Space
The creation of the Abstract Multimedia Space from the existing repositories of multimedia applications (Ground
Multimedia Space) is the basic prerequisite for using abstraction techniques in searching large multimedia
repositories. However interesting is the example of the MTS, which implies availability of abstract representations
of sets of multimedia applications, it is not representative. By and large most multimedia applications have their
own specific conceptual and presentational structure and an SMA representation of them is not readily available.
We need techniques for getting SMAs from multimedia applications themselves and from more detailed SMAs.
Creating the abstract space is a key issue in related fields in AI and a number of techniques and theories have been
proposed which form a good theoretical base for the creation and manipulation of the SMA space. In this section
we review some of these techniques in order to provide the basic concepts and the terminology we use in the
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
24
COMPUTER TECHNOLOGY INSTITUTE
1999
creation and manipulation of the SMA space. The completeness and consistency of the abstract space as well as the
computational savings gained by using abstractions is beyond the scope of this paper.
3.3.1
Abstraction in Artificial Intelligence – Related Work
Abstraction is generally defined as a mapping between a ground and an abstract space which preserves certain
properties. The abstraction methods vary according to:

the models used for the representation of the Ground and the Abstract space (formal systems, languages or
graphs) which results in different types of abstraction methods: model-based abstraction, graph-based,
syntactic abstractions, domain abstractions, abstractions as graph homomorphism, abstraction based on
irrelevance reasoning, behavioral abstractions, time-based abstractions etc,

whether the languages/models of the Ground and the Abstract space are the same or not,

the multiplicity of models used for reasoning and the use of hierarchies of abstraction spaces,

the goal of abstraction: problem solving, planning, search etc.
Giunchiglia and Walsh in [17] present a theory of reasoning with abstraction that unifies most previous work in the
area. Abstraction is defined as the process of mapping a representation of a problem (called the Ground
representation) onto a new representation (called the Abstract representation), preserving certain desirable
properties. For the representation of a problem they use formal systems (formal description of a theory described as
a set of folmulae , which represents the statements of the theory, written in language  which is a set of well
formed folmulae) and abstraction is defined as a mapping between formal systems: “An abstraction written
f : 1   2 is a pair of formal systems  1 ,  2  with languages Λ1 and Λ2 respectively, and an effective total
function f  : 1   2 . Σ1 is called the ground space, Σ2 is the abstract space and f  is the mapping function
between them. The theory was used to study the properties of abstraction mappings, to analyze various types of
abstractions and to classify previous work. A basic classification studied is whether the set of theorems of the
abstract theory is a subset, superset or equal to the set of theorems of the base theory.
Domain abstractions are abstractions which map a domain (constants or function symbols) onto a smaller and
simpler domain [17]. Hobbs in [20] suggests a theory of granularity in which different constants in ground space
are mapped onto -not necessarily different- constants in the abstract space according to an indistinguishability
relation. Imielinski in [23] proposes domain abstractions where objects in a domain are mapped onto their
equivalence classes; a domain in this work is considered to be the domain of a knowledge base and the domain
abstraction is defined by means of an equivalence relation on this domain, determined by what is of interest to the
user, what features the system should hide or by the degree of the required approximation. A similar approach
based on irrelevance reasoning is presented by Levy in [32]; he considers the problem of simplifying a knowledge
base and he presents a general schema for automatically generating abstractions tailored for a given set of queries,
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
25
COMPUTER TECHNOLOGY INSTITUTE
1999
by deciding which aspects of a representation are irrelevant to the query and removing them from the knowledge
base.
Abstraction is a widely studied technique for speeding up search and problem solving. Research in this area
investigates methods for the creation of the abstract space and for the use of the abstract solution to guide search. A
very common representation of a search space (or a problem space) is the STRIPS notation [15] where nodes
(states) represent sets of sentences in a formal language and operators that map one state to the other represent
relations between states (implicit graph representation). The idea behind guided search is to create from the
original graph a “simpler” graph, to find a solution in the simpler graph and use it to guide search in the original
graph. An abstraction is a mapping from one search space to another and a widely used type of mapping between
search spaces is “the homomophism”, a many-to-one mapping that preserves behavior. Homomorphism is used in
the ALPINE abstraction system [31], where ordered sequences of abstraction spaces are formed by dropping
certain terms from the language of a problem space and in the STAR method of abstraction [22] where the search
space is represented
by a graph and abstraction is defined as mapping between graphs based on graph-
homomorphism. The STAR method creates abstract spaces by selecting a state -the “hub” state- and grouping
together its neighbors within a given distance -the radious of abstraction; the process is repeated until all states
have been assigned to an abstract space, creating consequently a hierarchy of abstraction spaces.
In [41] Nayak and Levy propose a Semantic Theory of Abstraction as a different approach to the syntactic theory
of abstraction. Abstraction is defined as a model-level mapping (i.e. the decision what to abstract is made at the
model level, using knowledge about relevant aspects of the domain) from a detailed theory to an abstract one and is
viewed as a two step process where the intended domain model is first abstracted and then a set of abstract
formulas is constructed to capture the abstracted domain model. A special case of this Semantic Theory of
Abstraction is presented in [4] where structural and behavioral abstractions are defined for reducing complexity in
diagnosis: Structural abstraction is the abstraction where an abstracted component is created by grouping several
subcomponents together and whose behavioral description can be derived from the behavioral description of its
constituents. Behavioral abstraction is the abstraction where a component is assigned with behavioral models at
varying levels of precision and whose behavioral model can be automatically derived by using abstraction axioms.
In this work, the detailed and the abstract model are both represented in the same language.
To overcome the restrictions of abstractions by dropping sentences and mainly the dependency of the abstraction’s
effectiveness on the representation of the domain (knowledge engineers had to represent a domain in a certain way
that is not always feasible), Bergmann and Wilke [5] proposed abstractions with complete change of representation
language at the abstract level in which the detail is reduced (e.g. by abstracting the quantitative value expressed in
the sentence towards a qualitative representation). The prerequisites of the approach are the definition of the
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
26
COMPUTER TECHNOLOGY INSTITUTE
1999
abstract language by the domain expert and a domain abstraction theory in which the set of admissible ways of
abstracting states is predefined.
3.3.2
On the creation of the SMA space (Abstract Multimedia Space)
As a basis in the discussion on the creation of the SMA space, we first define the Ground and Abstract Multimedia
Space. Let M L denote the representation of a multimedia application M in a language/model L. Let also M L
denote a set of representations of multimedia applications in L.
Definition 3.3 : If L g is a language/model for the representation of concrete instances of multimedia applications
(e.g. HTML, Macromedia Director[35], XML[14]) then M Lg is a Ground Multimedia Space.
If La is a
language/model for the representation of SMAs then M La is an SMA space. M SMA L denotes an SMA space where
SMA-L is used for the representation of SMAs.
Definition 3.4 : An SMA transformation F is a function M L1  M L2 which preserves the conceptual and
presentational properties at the semantic level of all multimedia applications in M L1 and F(m)  M L2 is “simpler”
than m  M L1 . Two special cases of SMA trasformations are of particular interest: First is when M L1 is a ground
multimedia space and M L2 an SMA space in which case F is a base SMA transformation denoted Fbase ; for
instance, Fbase : M Lg  Μ SMA L . Second is when L1 = L2, i.e. M L1  Μ L2 are both SMA spaces with the same
language and F is an SMA-to-SMA transformation; for instance, F : M SMA L  Μ SMA
 L .
For the representation of concrete multimedia applications there exist a variety of languages/models [35], [19],
[33], [10] and standards [49], [14], [50]. The base SMA transformation is an abstraction by change of the
representation language, an approach similar to that proposed in [5] (briefly described in Sect. 3.3.1); it follows
model-based abstraction proposed as proposed in [41] (a two-step process where the first stage is the definition of
SMA-L that captures an intended SMA space).
To complete the definition of an Fbase we should define all the admissible abstraction mappings; however, to do
this we must use a specific L g for the ground multimedia space. In Section 3.5 we present a prototype system we
developed for the creation of an Abstract Multimedia Space: We consider L g be XML-based languages (XML
DTDs and SMIL documents) and La be SMA-L and we define the set of admissible abstraction mappings for the
Fbase SMA transformation. In this specific example, for the complete definition of an Fbase SMA transformation we
considered the following types of admissible abstraction mappings (similar types of abstraction mappings can be
defined for other L g s).
1.
Domain abstraction mappings, where objects considered “identical” in terms of certain qualities or
characteristics are mapped onto their equivalence class e.g. media objects onto their equivalent <content data
types>, real-world objects onto their equivalent <c_unit_type>.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
27
COMPUTER TECHNOLOGY INSTITUTE
2.
1999
Temporal abstraction mappings, where a concrete domain of quantitative temporal values (i.e. pairs of start
and end time points and exact durations of PUs) is mapped onto a domain of temporal relations (qualitative
representation).
3.
Spatial abstraction mappings, where a concrete domain of quantitative spatial values (i.e. exact positions of
spatial PUs) is mapped onto a domain of spatial relations (qualitative representation).
4.
Structural abstraction mappings, where a) a group of (similar) PUs is mapped onto a higher level PU, their
“grouping” or b) a set of component PUs is mapped onto a higher level PU, their aggregate. Structural
abstraction mappings apply on logical, temporal and spatial Presentational Views (see Definition 2.2).
The Fbase SMA transformation was defined as a mapping from sets of valid statements in an L g to a set of valid
statements representing PUs in SMA-L. A first step for the complete definition of an Fbase is to identify PUs in L g .
For instance, Elements in XML DTD or statements within <smil> </smil> tags in SMIL define PUs. A PU is
considered the basic unit on which the Fbase SMA transformation is to be defined and it corresponds to a sentence in
a language-based abstraction approach or to a state in a graph-based approach.
3.4 SMA-to-SMA Transformations and SMA Hierarchies
Creating hierarchies of abstraction spaces by repeating the abstraction process on the generated abstract spaces is a
technique used in the field of AI for approximate search -in general, for reasoning with approximation. In this
section, we introduce the basics for generating hierarchies of SMA spaces by applying abstraction transformations
(beyond the base SMA transformation Fbase ). Although hierarchies of abstraction spaces can be generated by
changing representation language, here we assume the same language La for all abstraction levels in an SMA
Space and exemplify it by SMA-L. The approach of applying transformation rules has been followed also in [26]
for the purpose of answering queries in terms of similarity of objects (objects that approximately match a pattern).
A related approach for answering and relaxing queries on structured multimedia database systems is presented in
[36] .
A complete definition of an SMA-to-SMA transformation is given by defining all the admissible abstraction
mappings (admissible SMA-to-SMA transformations).
3.4.1
Types of admissible SMA-to-SMA transformations
1. Relaxing Constraints by adding a disjunct or dropping a conjunct: Let C be a set of constraints in SMA-L. The
admissible SMA-to-SMA transformations for relaxing constraints c  C are:
a. Adding a disjunct Fdis :  c, c   C : c  c  is a relaxation of c. The transformation rule maps a constraint
onto a more generic one according to a generalization relation. SMA-to-SMA transformations by adding a
t
t
disjunct are defined on temporal and spatial constraints (noted Fdis
respectively).
and Fdis
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
28
COMPUTER TECHNOLOGY INSTITUTE
1999
b. Dropping a conjunct Fdrop : c, c   C : c is a relaxation of c  c  . This implies also that: c  C ,  is a
relaxation of c. SMA-to-SMA transformations by dropping a conjunct is defined for logical, spatial and
l
s
t
temporal constraints (noted Fdrop
respectively).
, Fdrop
, Fdrop
2. Structural SMA-to-SMA Transformations on abstraction constructs are:
a.
Unifying Structural SMA-to-SMA transformations Fstruct : this transformation rule maps a logical, temporal
or spatial set of PUs which form an aggregation or a grouping to a single PU.
drop
b. Dropping elements of aggregates Fstruct
It is defined for component PUs of aggregation constructs.
3. Relaxing Content Data types by adding a disjunct: Let D be a set of Content Data Types in SMA-L. The
admissible SMA-to-SMA transformation for relaxing Content data types d D is:
d
Adding a disjunct Fdis
: d , d   D : d  d  is a relaxation of d. The transformation rule maps a content
data type onto a more generic one according to a generalization relation.
To specify these types of admissible SMA-to-SMA transformations on SMAs, we first define SMA-to-SMA
transformations on PUs based on the relation a  “more or equally abstract than” on presentational properties of
PUs. SMA transformations on SMAs are synthesis of SMA transformations on their PUs.
3.4.2
Relaxing Constraints by adding a disjunct : Fdis
t
3.4.2.1 On <temporal constraints>: Fdis
Definition 3.5 : If T is the domain of temporal constraints, the relation a  “more or equally abstract than” on T is
defined as follows:  t, t   T : t  t  a  t .
For instance, if T = {meets, met-by, before, after, during, contains, overlaps, overlapped-by, starts, started-by,
finishes, finished-by, equal, sequential, parallel} then:
(1)
sequential a  t for t 
{meets,
met-by,
before,
after,
sequential},
where
sequential  meets  met _ by  before  after
(2) parallel
a
 t for t  {during, contains, overlaps, overlapped-by, starts, started-by, finishes, finished- by,
equal},where
parallel  during  contains  overlaps  overlapped _ by  starts  started _ by  finishes  finished _ by  equals
(3)  t  T , " not t" a  T  {x |t a  x} (e.g. “not parallel” a  any relationship of the set {meets, met-by, before,
after, sequential})
We denote with
a
 * the transitive closure of the relation
a
 .
Let pu denote the representation of a PU in SMA-L. Let also Pu denote a set of representations of PUs in SMA-L.
Definition 3.6: Let pu : T {t1 , t 2 ...t n } denote a pu  Pu with a set of <temporal constrains> t1 , t 2 ...t n .
t
t
Fdis
: Pu  P u , pu  : T {t1 , t 2 ...t n }  Fdis
( pu : T {t1 , t 2 ...t n }) iff  t i , t i  T , t i a  * t i
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
29
COMPUTER TECHNOLOGY INSTITUTE
1999
s
3.4.2.2 On <spatial constraints>: Fdis
Definition 3.7: If S is the domain of spatial constraints, the relation a  ”more or equally abstract than” on S is
defined as follows:  s, s   S : s  s  a  s
For instance, if S = {disjoint, meet, overlap, covered_by, covers, inside, contains, equal, g_overlap, within,
b_disjoint, b_meets, b_overlap}, then:
(1) b_overlap a  s for s {meets, overlap, covered_by, covers, equal, b_overlap}
(2) b_disjoint a  s for s  {disjoint, inside, b_disjoint}
(3 ) b_meets a  s for s  {meets, covered_by, covers, equal, b_meets}
(4) within a  s for s {inside, covered_by, covers, equal, within}
(5) g_overlap a  s for s {b_overlap, Within, g_overlap }
(6)  s  S , " not s"
a
 S  {x |s a  x}
Definition 3.8: Let pu : S{s1 , s 2 ...s n } denote a pu  Pu with a set of <spatial constrains> s1 , s 2 ...s n .
s
s
Fdis
: Pu  P u , pu  : S{s1 , s 2 ...s n }  Fdis
( pu : S{s1 , s 2 ...s n }) iff  si , si  S , si a  * si
3.4.3
Relaxing Constraints by Dropping a conjunct: Fdrop
Definition 3.9: Let T, S, L be sets of temporal, spatial and logical constraints in SMA-L. The relation a  “more or
equally abstract than” on T, S, L is defined as follows:
 t T :  a  t
 s  S :  a s
 t, t   T : t a  t  t 
 s, s   S : s a  s  s 
 l  L :  a l
 l, l   L : l a  l  l 
Definition 3.10: If pu : {l}T {t}, S{s} denote a pu  Pu with sets of <logical>, <temporal>, <spatial> constraints,
where and t  t1  t 2 ...  t k ,
s  s1  s 2 ...  s n , l  l1  l 2 ...  l m .
l ,t , s
l ,t , s
Fdrop
: Pu  P u , pu  : {l }, T (t }, S{s }  Fdrop
( pu : {l}, T (t}, S{s}) iff t  a * t and
3.4.4
s  a * s and l  a * l
Unifying Structural SMA-to-SMA transformations: Fstruct
t_a
s_a
3.4.4.1 On Aggregation Constructs: Fstruct
and Fstruct
t_a
Unifying Structural SMA-to-SMA Transformations on Temporal Aggregation Fstruct
follow; Definitions for Spatial
s_a
Aggregation Fstruct
are similar.
Definition 3.11: Let pu1 , pu 2 ... pu n be temporal PUs and pu1 . , pu 2 . ... pu n . their presentational durations. The
relation a  “more or equally abstract than” on temporal PUs is defined as follows:
If  pu  Pu : pu  AT ( pu1 , pu2 ... pun ) according to Definitio n 2.6 ,
then
pu a   pu1 , pu2 ... pun 
(where pu a   pu1 , pu2 ... pun  denotes that pu is “more or equally abstract than” the whole set of PUs <pu 1,
pu2… pun>.
Definition 3.12: Let A1T , A2T ... AnT be temporal aggregations PUs and A1T . , A2T . ... AnT . their presentational
durations. If  pu  Pu : pu  AT ( A1T , A2T ... AnT ) according to Definitio n 2.6 ,
then
pu a   A1T , A2T ... AnT 
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
30
COMPUTER TECHNOLOGY INSTITUTE
1999
(A set of temporal aggregations can be abstracted to a single -higher level- temporal aggregation if the set of its
constituents temporal aggregations form a higher level temporal aggregation).
Definition
3.13:
pu, pu   Pu,
Let
 )
pu  AT ( A1T , A2T ... AnT ) and pu   AT ( A1T , A2T ... AmT
two
PUs
where
mn
t_a
t_a
  pu ,  AiT ...Ai kT  pu, k  0 : AiT

Fstruct
: Pu  P u , pu   Fstruct
( pu) iff  AiT
a
*
 AiT ...Ai kT 
t_g
s_g
3.4.4.2 On Grouping constructs : Fstruct
and Fstruct
t_g
Unifying Structural SMA-to-SMA Transformations on Temporal Grouping Fstruct
follow; Definitions for Spatial
s_g
Grouping Fstruct
are similar.
Definition 3.14: Let pu1 , pu 2 ... pu n be temporal PUs and pu1 . , pu 2 . ... pu n . their presentational durations. The
relation a  “more or equally abstract than” defined as follows:
If  pu  Pu : pu  GT ( pu1 , pu2 ... pun ) according to Definitio n 2.7 ,
then
pu a   pu1 , pu2 ... pun 
Definition 3.15: Let G1T , G2T ...GnT be temporal groupings (PUs), G1T . , G2T . ...GnT . their presentational
durations and Rt1 , Rt 2 ...Rtn their temporal relationships.
If  pu  Pu : pu  GT (G1T , G2T ...GnT ) according to Definitio n 2.7 ,
then
pu a   G1T , G2T ...GnT 
(A set of temporal grouping can be abstracted to a single -higher level- temporal grouping if the set of its member
temporal groupings form a higher level temporal grouping; the Temporal Relationship of the higher level temporal
grouping is a generalized temporal relationship of the member temporal groupings).
Definition
3.16:
pu, pu   Pu,
Let
 )
pu  GT (G1T , G2T ...GnT ) and pu   GT (G1T , G2T ...GmT
PUs
where
mn
t_g
t_g
  pu ,  GiT ...Gi kT  pu, k  0 : GiT

Fstruct
: Pu  P u , pu   Fstruct
( pu) iff GiT
3.4.5
two
a
*
 GiT ...Gi kT 
drop
SMA-to-SMA transformations by Dropping elements (components) of aggregate constructs: Fstruct
Definition 3.17 : Let U  pu1 , pu 2 ... pu n  and U   pu1 , pu 2 ... pu n  be two sets of PUs,
pui , pu j  Pu,
as follows: If
i  1...n, j  1...m, m  n . The relation a  “more or equally abstract than” on U is defined
U   U then U  a  U
Definition 3.18: Let pu  A(U ) and pu   A(U ) two PUs where pu, pu   Pu (aggregation with sets of PUs U
and U ).
drop
drop
Fstruct
: Pu  P u , pu   Fstruct
( pu ) iff U  a  * U
3.4.6
d
Relaxing Content Data types by adding a disjunct: Fdis
Definition 3.19 : If D is the domain of content data types, the relation a  ”more or equally abstract than” on D is
defined as follows:  d , d   D : d  d  a  d
For instance, if D = {CONTENT, MULTIPLEXED_CONTENT, INPUT, OUTPUT, VISUAL OBJECT, IMAGE, VIDEO,
AUDIO, ANIMATION, TEXT, GRAPHICS, PICKER , HOTSPOT , SELECTABLE_CONTENT , STRING , VALUATOR ,
SELECTOR , MENU , EVENTER , BUTTON, ANIMATION, SLIDESHOW} then:
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
31
COMPUTER TECHNOLOGY INSTITUTE
1999
(1) CONTENT a  d for d  {VISUAL OBJECT, IMAGE, VIDEO, AUDIO, ANIMATION, TEXT, GRAPHICS, CONTENT }
(2) MULTIPLEXED_CONTENT a  d for d  {VIDEO, ANIMATION, SLIDESHOW, MULTIPLEXED_CONTENT }
(3) INPUT a  d for d  {PICKER , HOTSPOT , SELECTABLE_CONTENT , STRING , VALUATOR , SELECTOR , MENU ,
EVENTER , BUTTON, INPUT }
(4) OUTPUT a  d for d  {CONTENT, VISUAL OBJECT, IMAGE, VIDEO, AUDIO, ANIMATION, TEXT,GRAPHICS,
OUTPUT }
(5) PICKER a  d for d  {HOTSPOT , SELECTABLE_CONTENT, PICKER }
(6) SELECTOR a  d for d  {MENU, SELECTOR }
(7) EVENTER a  d for d  {BUTTON, EVENTER }
(8)  d  D, " not d " a  D  {x |d a  x}
Definition 3.20: Let pu{d1 , d 2 ...d n } denote a pu  Pu
with a set of <content data types> d 1 , d 2 ...d n (e.g.
d
d
: Pu  P u , pu {d 1, d 2 ...d n }  Fdis
( pu{d 1 , d 2 ...d n } iff  d i , d i  D, d i a  * d i
pu  A(d 1 , d 2 ...d n ) ). Fdis
3.4.7
SMA-to-SMA transformations on SMAs
An SMA is a sequence of presentational units (p_unit) and conceptual units (c_unit). A representation of an SMA
in SMA-L M SMA L is a sequence of c_unit and p_unit declarations: M SMA L   cu1 , cu 2 ... cu n , pu1 , pu2 ... pum  .
Definition 3.21: If F1 , F2 ... Fn are SMA-to-SMA transformations -of any of the above types- on Pu , then
F  F1  F2 .... Fn is also an SMA-to-SMA transformation on Pu (synthesis of SMA transformations).
Definition
3.22:
If M SMA L   cu1 , cu 2 ...cu n , pu1 , pu2 ... pum  then
an
SMA
transformation
  L is:
F : M SMA L  Μ SMA
F( M SMA L )  F  cu1 , cu 2 ... cu n , pu1 , pu2 ,... pun    F( cu1 ), F( cu 2 )... F( cu n ), F( pu1 ), F( pu2 ),... F( pun ) 
Definition 3.23: If M SMA L and M SMA L are two representations of an SMA, then
M SMA L a  M SMA L
3.4.8
iff
 F : M SMA L  F ( M SMA L )
Abstraction Hierarchies in an SMA space
The following hold for the relation a  in an SMA space
i.
For all M L a  M La , M L a
a
ii.
For any M1 L a , M 2 L a , M 3 L a M La , if M1 L a a  M 2 L a and M 2 L a a  M 3 L a then M1 L a a  M 3 L a
iii.
For any M1 L a , M 2 L a M La , if M1 L a a  M 2 L a and M 2 L a a  M1 L a then M1 L a  M 2 L a
M L a (Each SMA representation matches itself )
The relation a  “more or equally abstract” on is reflexive (i), transitive (ii) and antisymmetric (iii), so a  is a
partial ordering on M La . If an SMA space M La is a partially ordered set of SMAs with respect to a  , then an
SMA-to-SMA transformation F on M La defines a new partial ordered set with respect to a  . Note that if an SMAto-SMA transformation F is not meaningful when applied to an M La then F( M La )  M La .
3.4.9
On Searching the abstract multimedia space: Query evaluation
A Query to the SMA space M La is an SMA M Query L represented in a query language Query-L: Q  M Query L .
Answers to the query Q are all the SMAs M La for which there exist an SMA transformation F such as :
F ( M La )  Q .
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
32
COMPUTER TECHNOLOGY INSTITUTE
1999
As a null transformation is an SMAtransformation, SMAs that match the query with no SMA transformations are
also answers to the query. The query language Query-L could be either a language for the representation of the
Multimedia Ground Space Lg or a language for the representation of SMAs such as the SMA-L.
i. If Q  M Lg then the query is first abstracted onto an SMA representation M La by applying an Fbase SMA
transformation: Fbase (Q)  Q ,
where Q   M La .
Q is then used as the key value for searching the SMA space M La . Hence, instead of solving the query in the
complex Ground Multimedia Space, the query is mapped onto and solved in the Abstract Multimedia Search Space
(an SMA space). The solution is either used to guide search in the ground space (guided search) or contain links to
the multimedia applications that fall under these SMAs (approximate answers).
3.4.9.1 Relaxation of queries
If there is no SMA transformation F on M La such as F ( M La )  Q , then the query can be “relaxed” by applying
SMA-to-SMA transformations on it. This can be used for answering approximate queries.
Definition 3.26: Similarity distance is the number of SMA-to-SMA transformations applied to a query Q or an
SMA M La : F ( M La )  Q.
3.4.9.2 Satisfying the need for mixed abstraction levels and the principles of Human Remembering
The proposed abstraction model allows queries that satisfy the parameters listed in 3.2.1 by means of the
following:

the use of the same language in the hierarchy of abstraction levels within the SMA space which allows users
mix the abstraction levels in the same query,

the SMA-to-SMA transformations, which allow users decide on the tightness of the constraints, on the
complexity of abstraction constructs, on the temporal and spatial grain size etc.

the various types of admissible SMA transformations, addressing different Presentational Views of multimedia
applications. The presentational View is determined by applying the relevant SMA transformation on those
properties of SMAs that are considered of less importance for the specific goal.

the use of a basic unit (PU) for structuring SMAs and the definition of SMA transformations on PUs; this
allows the decision about the PU to determine the desired grain size of the presentational structure in a way
conformant to user’s perception of the application’s presentational structure.
3.5 Application: Wrapping presentational structure of XML-based web documents
Extensible Markup Language (XML)[14], the standard developed by the W3C as a subset of SGML[49], enables
users design their own markup languages and use them to create structured multimedia documents. The formal
definition of a particular class of documents (i.e., the markup language for this class of documents) is described in
the Document Type Definition (DTD), which is an attribute grammar. XML documents conform to their DTD;
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
33
COMPUTER TECHNOLOGY INSTITUTE
1999
document components are defined as XML “Elements”; nested elements in a DTD form a tree structure. Adding
semantics for specific domains is achieved via a DTD. Synchronized Multimedia Integration Language (SMIL)
[50] allows the integration of a set of independent multimedia objects into a synchronized multimedia presentation.
The syntax of SMIL documents is defined by an XML DTD.
As mentioned in Sect. 3.3.2 the complete definition of a basic SMA transformation Fbase requires a specific
language Lg for the representation of the Ground Multimedia Space. In order to create an Abstract Multimedia
Space and experiment with it, we considered XML-based languages for the multimedia ground space (XML
DTDs and SMIL documents) and SMA-L for the SMA space, defined the set of admissible abstraction mappings
for Fbase , and implemented a prototype system for automatically creating SMAs from XML DTS and SMIL
documents [51]. The system consists of:
a)
an “XML DTD-to-SMA” wrapper which takes as input an XML DTD, parses it, extracts the conceptual
structure and part of the presentational structure of multimedia documents and produces an SMA-L
representation of them.
F : M XML_DTD  Μ SMA L where DTD is a set of Element type definitions in XML. Each Element type is mapped
onto either a conceptual or a presentational unit in the corresponding SMA, depending on its attribute list.
b) A “SMIL document-to-SMA wrapper”, which takes as input a SMIL document, parses it, extracts from SMIL
statements the presentational temporal and spatial structure of multimedia documents (including temporal and
spatial constraints) and produces the SMA representation of them. F : M SMIL  Μ SMA L
An example of an XML DTD and a SMIL document and their corresponding representation in SMA-L are given
below. Higher abstraction levels can be achieved by applying SMA-to-SMA transformations on the generated
SMAs.
XML DTD
<! ELEMENT landmark (Hotel | Museum | Castle)* >
<! ELEMENT Hotel (Room+, Hall*, Facilities*)>
<ATTLIST Hotel
src CDATA #REQUIRED
name CDATA #REQUIRED>
<! ELEMENT Room EMPTY >
<ATTLIST Room
src CDATA #required >
<! ELEMENT Hall (#PCDATA) >
<! ELEMENT Facilities (B+ | C ) >
……
SMA representation in SMA-L
C_UNIT landmark
TYPE GROUP OF (GENERIC(Hotel, Museum, Castle)):{0 or
more}
C_UNIT Hotel
TYPE AGGREGATION OF ((GROUP OF Room):{1 or more},
GROUP OF Hall:{0 or more}, GROUP OF Facilities:{0 or more})
PRESENTED_BY AGGREGATION OF (CONTENT, name)
P_UNIT name TYPE TEXT
C_UNIT Room
PRESENTED_BY CONTENT
C_UNIT Hall
PRESENTED_BY TEXT
C_UNIT Facilities
TYPE GENERIC (GROUP OF B:{one or more}, C)
………….
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
34
COMPUTER TECHNOLOGY INSTITUTE
1999
SMIL Document
<smil>
<body>
<seq>
<par>
<video src = video1 dur = 10s / >
<audio src = audio1 dur = 8s» / >
</par>
<par>
<text src = text1 begin = 6s / >
<audio src = audio2 / >
</par>
</seq>
</body>
</smil>
4.
video1
audio1
6''
text11
audio2
SMA representation in SMA-L
P_UNIT xxx
TYPE AGGREGATION OF (
AGGREGATION OF (video, audio): T {starts},
AGGREGATION OF (text, audio): T {parallel}
) : T meets
Conclusions / Open Issues
A query is equivalent to the specification of its answer. An abstract specification –all specifications are
abstractions of a sort- is equivalent to a query on some abstraction of the items specified. In this paper we
developed Semantic Multimedia Abstractions (SMA) which capture the conceptual and presentational properties of
multimedia applications. They have been used either as Model Title Specifications or as a model for looking up
multimedia applications.
Our main conclusion from this work –supported by the system developed in [51]- is that SMAs constitute a feasible
approach to the original problem, namely how to search large multimedia repositories for applications matching
conceptual and/or presentational characteristics given in mixed, end-user defined abstraction level.
The SMA definition and query language, or equivalently the SMA model, is quite satisfactory for the
representation of multimedia applications at that abstraction level which, being congruent with human memory, is
suitable for on-line searching by end users. On the other hand the use of SMA-L is not mandatory: several of the
existing languages, models and standards can be enhanced to produce equivalent representation and querying
mechanisms.
Two types of issues are left open by our work: on the one hand are theoretical issues such as the completeness and
consistency of the abstract space; on the other are practical such as the computational savings gained by using
abstractions –which should take into account the formation of the abstract space, rather a complex engineering
endeavour in itself.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
35
COMPUTER TECHNOLOGY INSTITUTE
5.
1999
References
[1] S. Adali, P. Bonatti, M. Sapino, V.S Subrahmanian, “A Multi-Similarity Algebra”, Proc. SIGMOD ‘98 Conf. on
Management of Data, 1998, pp. 402-413.
[2] W. Al-Khatib, Y. F. Day, A. Ghafoor, P.B. Berra, “Semantic Modeling and Knowledge Representation in Multimedia
Databases”, IEEE Trans. Knowledge and Data Eng., vol. 11, no. 1, pp. 64-80, Jan./Feb. 1999.
[3] J. F. Allen, “Maintaining Knowledge about Temporal Intervals”, Communications of the ACM, Vol. 26, No. 11, Nov.
1983.
[4] K. Autio, “Abstraction of behavior and structure in model-based diagnosis”, Proc. DX-95 The 6th Int’l Workshop on
Principles of Diagnosis, Goslar, Germany, Oct. 1995.
[5] R. Bergmann and W. Wilke, “Building and Refining Abstract Planning Cases by Change of Representation Language”,
Journal of Artificial Intelligence Research 3, 1995., pp. 53-118.
[6] K. Bohm and T. Rakow, “Metadata for Multimedia Documents”, Special issue of SIGMOD Record, Dec. 1994.
[7] S. Boll, W. Klas, “ZYX – A semantic model for multimedia documents and presentations”, Proc. 8th IFIP Conference on
Data Semantics (DS-8): “Semantic Issues in Multimedia Systems”, Jan. 5-8, New Zealand, 1999.
[8] A. Borgida, J. Mylopoulos, H. Wong, “Generalization/Specialization as a Basis for Software Specification”, “On
Conceptual Modelling”, edited by Brodie M., Mylopoulos J., Schmidt, Springer-Verlag, 1982, pp. 87-117.
[9] M. Brodie, D. Ridjanovic, “On the Design and Specification of Database Transactions”, “On Conceptual Modelling”,
edited by Brodie M., Mylopoulos J., Schmidt, Springer-Verlag, 1982, pp. 277-306.
[10] W. W. Chu, C.C. Hsu, A.F. Cardenas, R.K.Taira, “Knowledge-Based Image Retrieval with Spatial and Temporal
Constructs”, IEEE Trans. Knowledge and Data Eng., vol. 10, no. 6, Nov/Dec 1998.
[11] V.Delis, Th. Hadzilacos, “Binary String Relations: A Foundation for Spatio-Temporal Knowledge Representation”, Proc.
ACM Conf. On Information and Knowledge Management (CIKM), 1999.
[12] J.D. Dionisio and A. F. Cardenas, “A Unified Model for Representing Multimedia, Timeline and Simulation Data”, IEEE
Trans. Knowledge and Data Eng., vol. 10, no. 5, Sept./Oct. 1998
[13] J. M. Egenhofer and R. J. Herring, “Categorizing Binary Topological Relationships Between Regions, Lines and Points in
Geographic Databases”, Tech. Report, Department of Surveying Engineering, Un. of Maine, 1990.
[14] Extensible Markup Language (XML), W3C Recommendations XML 1.0, Feb. 1998, http://www.w3.org/XML/.
[15] R. Fikes and N.J. Nilson, “STRIPS: A New Approach to the Application Theorem Proving to Problem Solving”, Artificial
Intelligence, Vol. 2, pp. 189-208.
[16] D. Gardelis, Th.. Hadzilacos, P. Kourouniotis, M. Koutlis, E. Megalou, “Automating the generation of multimedia titles”,
Proc. 10th Int’l Conf. on Advanced Science and Technology (ICAST’94), Chicago, USA, Mar. 1994.
[17] F. Giunchiglia and T. Walsh, “A theory of Abstraction”, Artificial Intelligence, Vol.56, No. 2-3, pp. 323-390, 1992.
[18] W. Grosky, F. Fotouli, I. Sethi, “Using Metadata for Intelligent Browsing of Structured Media Objects”, SIGMOD Record,
Vol. 23, No. 4, Dec. 1994.
[19] N. Hirzalla, O. Megzari, A. Karmouch, “An Object-Oriented Data Model and a Query Language for Multimedia
Databases”, IEEE ICECS’95, Dec. 1995.
[20] J.R. Hobbs, “Granularity”, Proc. 9th Int’l Joint Conference on Artificial Intelligence (IJCAI), pp. 432-435, 1985.
[21] R.C. Holte, C. Drummond, M.B. Perez, R.M. Zimmer, A.J. MacDonald, “Searching with Abstraction: A unifying
Framework and a New High Performance Algorithm”, Proc. 10th Canadian Conf. on AI, pp. 263-270, 1994.
[22] R.C. Holte, T. Mkadmi, “Speeding up Problem Solving by Abstraction: A Graph Oriented Approach”, Special Issue of
Artificial Intelligence, (spring 1996) on Empirical AI, edited by P. Cohen and B. Porter.
[23] T. Imielinski, “Domain Abstraction and Limited Reasoning”, Proc. 10th Int’l Joint Conf. on AI, 1987, pp. 997-1003.
[24] ISO/IEC 10166-2:1991, Document Filing and Retrieval (DFR) - Part 2: Protocol Specification.
[25] Y. Iwasaki, “Reasoning with Multiple Abstraction Models”, Proc. 4th Int’l Workshop on Qualitative Physics, 1990.
[26] H. Jagadish, A. Medelzon, T. Milo, “Similarity-Based Queries”, Proc. ACM PODS, San Jose, May 1995, pp. 36-45.
[27] J.N Johnson-Lairdm., “Mental Models”, Harvard University Press, 1983.
[28] V. Kashyap, K. Shah, A. Sheth, “Metadata for building the MultiMedia Patch Quilt”, "Multimedia Database Systems:
Issues and Research Directions", S. Jajodia and V.S.Subrahmaniun, Eds., Springer-Verlag, 1995.
[29] W. Klas, E.J. Neuhold, and M. Schrefl, “Using an Object-Oriented Approach to Model Multimedia Data”, Computer
Comm. Vol. 13, no. 4, pp.204-216, May 1990.
[30] W. Klas and A. Sheth, "Metadata for Digital Media", Special issue of SIGMOD Record, Dec. 1994.
[31] C.A. Knoblock, “Automatically Generating Abstractions for Planning”, Artificial Intelligence, Vol. 68, No. 2, 1994.
[32] A. Levy, “Creating Abstractions Using Relevance Reasoning”, Proc. 12th National Conf. on AI, Seattle, Aug. 1994.
[33] J. Z. Li, M.T. Özsu, D. Szafron, V. Oria, "MOQL: A Multimedia Object Query Language", Proc. 3rd Int’l Workshop on
Multimedia Information Systems, Como, Italy, Sept. 1997, pp. 19-28.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
36
COMPUTER TECHNOLOGY INSTITUTE
1999
[34] T.D.C Little and A.Grafoor, “Interval-Based Conceptual Models for Time-Dependent Multimedia Data”, Trans. on
Knowledge and Data Engineering, Vol. 5, No. 4, 1993.
[35] Macromedia Director, Macromedia Inc., http://www.macromedia.com
[36] S. Marcus, V.S. Subrahmanian, “Foundations of Multimedia Database Systems”, JACM, vol. 43, no 3., pp. 474-523, 1996.
[37] E. Megalou and Th.Hadzilacos, “On Conceptual Modeling for Interactive Multimedia Presentations”, Proc. 2nd Int’l Conf.
on Multimedia Modeling ’95 (MMM’95), Singapore, pp. 51, World Scientific.
[38] E. Megalou, Th. Hadzilacos, N. Mamoulis, "Conceptual Title Abstractions: Modeling and Querying Very Large
Interactive Multimedia Repositories", Proc. 3rd Int’l Conf. on Multi-Media Modeling (MMM'96), Toulouse, 1996.
[39] C. Meghini, F. Rabitti, C. Thanos, “Conceptual Document Modeling and Retrieval”, Computer Standards & Interfaces,
Vol. 11, 1990/91, pp.195-213.
[40] I. Mozetic, I. Bratko, T. Urbancic, “Varying Levels of Abstraction in Qualitative Modelling”, Machine Intelligence, Vol.
12, Clarendon Press, 1991, pp. 259-280.
[41] P. Nayak, A. Levy, “A Semantic Theory of Abstraction”, Proc. Int’l Joint Conf. on AI, Montreal, Canada, 1995.
[42] S. B. Navathe, “Evolution of Data Modeling for Databases”, Communications of the ACM, vol. 35, no. 9, Sept. 1992.
[43] R.J. Peters, A. Lipka, M. T. Oszu, D. Szafron, “The Query Model and Query Language of TIGUKAT”, Tech. Rep. 93-01,
Dept of CS, Univ. of Alberta, Jan. 1995.
[44] D.A Plaisted, “Theorem Proving with Abstraction”, Artificial Intelligence Vol. 16, pp. 47-108, 1981.
[45] A. Prieditis and B. Janakiraman, “Generating Effective Admissible Heuristics by Abstraction and Reconstitution”, Proc.
AAAI, 1993, pp. 743-748.
[46] J. Rumbaugh et al. “Object-Oriented Modeling and Design”, Prentice Hall, 1991.
[47] E.D. Sacerdoti, “Planning in a hierarchy of abstraction spaces”, Artificial Intelligence, Vol. 5, 1974, pp 115-135.
[48] G. Schloss, M. Wynblatt, “Providing definition and temporal structure for multimedia data”, ACM Multimedia Systems,
vol. 3, no 5/6, Nov 1995.
[49] Standard Generalized Markup Language (SGML), International Standard ISO 8879.
[50] Synchronized Multimedia Integration Language (SMIL), W3C Recommendation SMIL 1.0, 1998,
http://www.w3.org/TR/REC-smil/
[51] G. Sygletos , “Abstracting XML-based documents to Semantic Multimedia Abstractions”, Diploma Thesis, Univ. of
Patras, Greece, Dept. of Computer Engineering and Informatics, 1999 (in Greek).
[52] N. Tryfona and Th. Hadzilacos, “Geographic Applications Development: Models and Tools for the Conceptual Level”,
Proc. 3rd ACM GIS Workshop CIKM’95, Baltimore, Maryland, USA , Dec. 1995.
[53] ValMMeth, “Validation of a Multimedia Title Series Production Methodology”, Innovation Program IN34D, DGXIII,
European Commission, CTI-R&D Unit 3, 1997-98.
[54] Vazirgiannis M., Theodoridis Y., Sellis T., “Spatio-Temporal Composition in Multimedia Applications”, Proc. Int’l
Workshop on Multimedia Software Development, Berlin, IEEE-ICSE, 1996.
[55] K. Wittenburg, “Introduction: Abstraction in Multimedia”, ACM Workshop on Effective Abstractions in MMedia, 1995.
____________________________________________________________________________________________
TECHNICAL REPORT No. TR99/09/06
37
Download