MPEG-7 Applications Document

advertisement
INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11/
N3934
January 2001/Pisa
Title:
MPEG-7 Applications Document v.10
Source:
Requirements Group (Anthony Vetro, Editor)
Status:
Approved
MPEG-7 Applications
1.
INTRODUCTION ......................................................................................................................... 2
2.
MPEG-7 FRAMEWORK............................................................................................................. 2
3.
MPEG-7 APPLICATION DOMAINS ........................................................................................ 3
4.
“PULL” APPLICATIONS ........................................................................................................... 3
4.1
4.2
4.3
4.4
4.5
4.6
4.7
5.
STORAGE AND RETRIEVAL OF VIDEO DATABASES ................................................................... 4
DELIVERY OF PICTURES AND VIDEO FOR PROFESSIONAL MEDIA PRODUCTION ......................... 6
COMMERCIAL MUSICAL APPLICATIONS (KARAOKE AND MUSIC SALES) ................................... 7
SOUND EFFECTS LIBRARIES................................................................................................... 10
HISTORICAL SPEECH DATABASE ........................................................................................... 10
MOVIE SCENE RETRIEVAL BY MEMORABLE AUDITORY EVENTS ............................................ 11
REGISTRATION AND RETRIEVAL OF MARK DATABASES ......................................................... 11
“PUSH” APPLICATIONS ........................................................................................................ 13
5.1
USER AGENT DRIVEN MEDIA SELECTION AND FILTERING....................................................... 14
5.2
PERSONALISED TELEVISION SERVICES ................................................................................. 15
5.3
INTELLIGENT MULTIMEDIA PRESENTATION ........................................................................... 16
5.4 PERSONALIZABLE BROWSING, FILTERING AND SEARCH FOR CONSUMERS .................................... 17
5.5
INFORMATION ACCESS FACILITIES FOR PEOPLE WITH SPECIAL NEEDS .................................... 18
6.
SPECIALISED PROFESSIONAL AND CONTROL APPLICATIONS ............................... 19
6.1
TELESHOPPING ..................................................................................................................... 19
6.2
BIO-MEDICAL APPLICATIONS ................................................................................................ 21
6.3
UNIVERSAL ACCESS ............................................................................................................. 22
6.4
REMOTE SENSING APPLICATIONS ......................................................................................... 24
SEMI-AUTOMATED MULTIMEDIA EDITING............................................................................................ 25
6.5
EDUCATIONAL APPLICATIONS ............................................................................................... 26
6.6
SURVEILLANCE APPLICATIONS .............................................................................................. 27
6.7
VISUALLY-BASED CONTROL ................................................................................................. 28
7. APPLICATIONS BASED ON SPECIFIC DESCRIPTORS & DESCRIPTION SCHEMES ... 29
7.1 TRANSCODING APPLICATIONS ....................................................................................................... 29
8.
REFERENCES ............................................................................................................................ 34
ANNEX A: SUPPLEMENTARY REFERENCES, BY APPLICATION ........................................ 35
4.1 & 4.2 STORAGE AND RETRIEVAL OF VIDEO DATABASES & DELIVERY OF PICTURES AND VIDEO FOR
PROFESSIONAL MEDIA PRODUCTION .................................................................................................... 35
5.1
USER AGENT DRIVEN MEDIA SELECTION AND FILTERING....................................................... 38
5.2
INTELLIGENT MULTIMEDIA PRESENTATION ........................................................................... 39
6.3
UNIVERSAL ACCESS ............................................................................................................. 39
6.5
EDUCATIONAL APPLICATIONS .............................................................................................. 40
ANNEX B: AN EXAMPLE ARCHITECTURE FOR MPEG-7 PULL APPLICATIONS ............ 41
1. Introduction
This ‘MPEG-7 Applications Document’ describes a number of applications that are
enabled by MPEG-7 tools. It certainly does not list all the applications enabled by
MPEG-7, but rather gives an idea of the potential applications that could be
supported.
The purpose of this document is to provide a better understanding of what the
MPEG-7 standard is, and the functionality that could be delivered. In the past, this
document was used as a tool to help write concrete requirements. For this reason, the
earlier sections that describe the push/pull and other specialised applications not only
include the description of the application provided, but also application-specific
requirements and the requirements that were imposed on the MPEG-7 standard. These
requirements are kept in place since they do indeed provide some important insight
regarding the development of MPEG-7.
In October 2000, MPEG-7 reached the stage of Committee Draft (CD). This stage
indicates a mature specification, and, for the first time, the specification is made
publicly available. In section 7 of this document, applications that are based on actual
descriptors and description schemes that are part of the specification are described.
2. MPEG-7 Framework
Today, more and more audio-visual information is available from many sources
around the world. Also, there are people who want to use this audio-visual
information for various purposes. However, before the information can be used, it
must be located. At the same time, the increasing availability of potentially interesting
material makes this search more difficult. This challenging situation led to the need of
a solution to the problem of quickly and efficiently searching for various types of
multimedia material interesting to the user. Moreover, MPEG-7 not only enables this
type of search, but also enables filtering. Thus, MPEG-7 will support both push and
pull applications. MPEG-7 wants to answer to this need, providing this solution.
MPEG-7, formally called ‘Multimedia Content Description Interface’, standardises:

A set of description schemes and descriptors
MPEG-7 Applications Document
2

A language to specify description schemes, i.e. a Description Definition
Language (DDL).

A scheme for coding the description
For more details regarding the MPEG-7 background, goals, areas of interest, and work
plan please refer to document N2729, “MPEG-7 Context, Objectives, and Technical
Roadmap” [1]. MPEG-7’s requirements and definitions are indicated in document
N2727, “MPEG-7 Requirements Document” [2].
3. MPEG-7 Application Domains
The increased volume of audio-visual data available in our everyday lives requires
effective multimedia systems that make it possible to access, interact and display
complex and inhomogeneous information. Such needs are related to important social
and economic issues, and are imperative in various cases of professional and
consumers applications such as:














Education,
Journalism (e.g. searching speeches of a certain politician using his name, his
voice or his face),
Tourist information,
Cultural services (history museums, art galleries, etc.),
Entertainment (e.g. searching a game, karaoke),
Investigation services (human characteristics recognition, forensics),
Geographical information systems,
Remote sensing (cartography, ecology, natural resources management, etc.),
Surveillance (traffic control, surface transportation, non-destructive testing in
hostile environments, etc.),
Bio-medical applications,
Shopping (e.g. searching for clothes that you like),
Architecture, real estate, and interior design,
Social (e.g. dating services), and
Film, Video and Radio archives.
4. “Pull” Applications
A preliminary note on the division of this document:
There is a multitude of ways of dividing this group of applications into
different categories. Originally, applications were divided by medium, but later
were categorised by delivery paradigm. This is not to imply an ordering or
priority of divisions, but is simply a reflection of what was convenient at the
time. Other means of dividing the list of applications may be done by content
type, user group, and position in the content.
MPEG-7 began its life as a scheme for making audio-visual material “as searchable as
text is today.” Although the proposed multimedia content descriptions are now
acknowledged to serve much more than search applications, they remain for many the
primary applications for MPEG-7. These retrieval, or “pull” applications, involve
MPEG-7 Applications Document
3
databases, audio-visual archives, and the web-based Internet paradigm (a client
requests material from a server.)
4.1 Storage and retrieval of video databases
Application Description
Television and film archives store a vast amount of multimedia material in several
different formats (digital or analogue tapes, film, CD-ROM, etc.) along with precise
descriptive information (meta-data) which may or may not be precisely timecoded.
This meta-data is stored in databases with proprietary formats. There is an enormous
potential interest in an international standard format for the storage and exchange of
descriptions that could ensure:
 interoperability between video archive operators,
 perennial relevance of the meta-data, and
 a wider diffusion of the data to the professional and the general public.
MPEG-7, in short, must accommodate visual and other search of such existing
multimedia databases.
In addition, a vast amount of the older, analogue audio-visual material will be
digitised in years to come, which creates a tremendous opportunity to include contentbased indexing features (which can be extracted during the digitisation/compression
process) into those existing data-bases.
In the case of new audio-visual material, the ability to associate descriptive
information within video streams at various stages of video production can
dramatically improve the quality and productivity of manual, controlled-vocabulary
annotation of video data in a video archive. For example, pre-production and postproduction scripts, information captured or annotated during shooting, and postproduction edit lists would be very useful in the retrieval and re-use of archival
material.
Essential associated activities to this one are cost-efficient video sequence indexing
and shot-level indexing for stock footage libraries [3].
A sample architecture is outlined in Annex B.
Application requirements
Specific requirements for those applications are:





Support of full-text descriptions as well as structured fields (database
descriptions);
Multi-language support;
We desire the ability to interoperate between different content description
semantics (e.g. different database schemas, different thesauri, etc.) or to translate
from each content description semantic into MPEG-7 semantics;
The ability to reference audio-visual objects or object instances and time
references, even in analogue format;
The ability to include descriptions with incomplete or missing time references (a
shot description that has not been timecoded);
MPEG-7 Applications Document
4

The ability to handle multiple versions of the same document at several stages in
the production process, and descriptions that apply to multiple copies of the same
material.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

There must exist a class of descriptors that support unstructured (free-) text in
multiple languages.
A note about text: Descriptors should depend as little as possible on a specific
language. If text is needed as a descriptor, the language used must be
specified in the text description and a text description may contain several
translations. The character set chosen must enable the use of all languages (as
appropriate to ISO).

There must exist a class of descriptors that support structured text.

There must exist a mechanism by which different MPEG-7 DS’s can interoperate.

There must be a robust linking mechanism that allows for temporal references to
material with incomplete timecode.

There must be a mechanism by which document versions may be identified.
Application relevant work and references
Bloch, G. R. (1988). From Concepts To Film Sequences. In RIAO 88, (pp. 760-767).
MIT Cambridge MA.: March 21-24, 1988.
Davis, M. (1993). Media streams: An iconic visual language for video annotation.
Telektronikk, 89(4), 59 - 71.
EBU/SMPTE Task Force, First Report : User Requirements, version 1.22, chapitre 1
: Compression, chapitre 2 : Metadata and File Wrappers, chapitre 3 : Transfer
Protocols. SMPTE Journal, April 1997.
Parkes, A. P. (1989b). The Prototype CLORIS system: Describing, Retrieving and
Discussing Videodisc Stills and Sequences. Information Processing and
Management, 25(2), 171 - 186.
ISO/TC 46 /SC 9, Information and documentation - Presentation, identification and
description of documents
<http://www.nlc-bnc.ca/iso/tc46sc9/index.htm>
ISO/TC 46 /SC 9, Working Group 1 (1997) Terms of reference and tasks for the
development of an International Standard Audio-visual Number (ISAN).
Document ISO/TC 46/SC 9 N 235, May 1997.
See also Annex A.
MPEG-7 Applications Document
5
4.2 Delivery of pictures and video for professional media production
Application description
[note: this section is still to be re-evaluated]
Studios need to deliver appropriate videos to TV channels. The studio may have to
deliver a whole video, based on some global meta-data, or video segments, for
example to edit an archive-based video, or a documentary, or advertisement videos.
In this application, due to the users’ expertise, one formulates relevant and possibly
detailed “pull” queries, which specify the desired features of some video segments.
With present video databases, these queries are mainly based on objective
characteristics at segment level. However, they can also take advantage of subjective
characteristics of these segments, as perceived by one or several users.
The ability to formulate a single query on the client side, and send it to many
distributed databases is very important to many production studios. The returned items
should include visual abstracts, copyright and pricing information, as well as a
measure of the technical quality of the source video material.
In this application, one should separate news programs, which must be made widely
and instantly available for a short period of time, from other production programs,
which can be retrieved on a permanent basis, usually from secondary or tertiary
storage. On-line news services providing instant access to the day’s news footage are
being built by many broadcasters and archives (including INA, BBC, etc), using
proprietary formats, and would benefit from standardisation if they were to be
consolidated into common services such as the Eurovision News Exchange (which
currently uses broadcast channels, not databases).
Still pictures have similar applications and requirements as pertaining to design. The
web designer must not only make new designs but also collect the already available
graphics on the net for use in the designed web sites. Other design fields have similar
uses for visual search.
Application requirements
Requirements are similar to the previous application. They are mainly characterised
by:

Support of feature-based and concept-based queries at segment level, and

Support for similarity queries.

Support for different data formats.

Support for art & design specific parameters.

Media summaries for fast browsing.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

There must exist a mechanism that embodies conceptual knowledge.
MPEG-7 Applications Document
6

There must exist a mechanism that allows for links to media summaries.
Application relevant work and references
Aigrain, P., Joly, P., & Longueville, V. (1995). Medium Knowledge-Based MacroSegmentation of Video into Sequences. In M. Maybury (Ed.) (pp. 5-16), IJCAI
95 - Workshop on Intelligent Multimedia Information Retrieval. Montréal:
August 19, 1995
The Art Teacher Connection
http://www.primenet.com/~arted
Cohen, A., Levy, M., Roeh, Itzhack, Gurevitch, M. (1995) Global Newsrooms, Local
Audiences, A study of the Eurovision News Exchange, Acamedia Research
Monograph 12, John Libbey.
European League of Institutes of the Arts
http://www.elia.ahk.nl
Pentland, A. P., Picard, R., Davenport, G., & Haase, K. (1994). Video and Image
Semantics: Advanced Tools for Telecommunications (Technical Report No.
283). MIT.
Sack, W. (1993). Coding News And Popular Culture. In The International Joint
Conference on Artificial Intelligence (IJCA93) Workshop on Models of
Teaching and Models of Learning. Chambery, Savoie, France.
Zhang, H., Gong, Y., & Smoliar, S. W. (1994). Automated parsing of news video. In
IEEE International Conference on Multimedia Computing and Systems, (pp.
45 - 54). Boston: IEEE Computer Society Press.
see also 4.1 ‘Application relevant work and references’, and Annex A.
4.3 Commercial musical applications (Karaoke and music sales)
Application Description
The Karaoke industry is extremely large and popular. One of the aims of the pastime
is to make the activity of singing in public as effortless and unintimidating as possible.
Requiring a participant to recall the name and artist of a popular tune is unnecessary
when one considers that the amateur performer must know the song well enough to
sing it. A much friendlier interface results if you allow someone to hum a few
memorable bars of the requested tune, and to have the computer find it (or a short list
of alternatives, if the brief segment under-specifies the intended selection).
Alternatively, one may speak the title and author, lyrics, or lyrical key words.
A similar application dealing with Karaoke, but also relating to music sales, below, is
enabling solo Karaoke-ists to expand their repertoire in the privacy of their own home.
Much of the industry is currently driven by people wishing to practice in their own
homes. One can easily imagine a complete on-line database, in which someone selects
MPEG-7 Applications Document
7
a song she knows from the radio, sings a few bars, and the entire arrangement is
downloaded to their computer, with appropriate payment extracted.
Another consumer music application is a rethinking of the consumer music industry in
the on-line world. The current model for music distribution is to go to a store,
purchase a music compact disk, bring it home and play it. An alternative model of
growing popularity is to download a compressed song from the Internet, store it on
your computer hard drive, and play it via a real-time software decoder. Even at the
current density of computer hard drives, more than 130 CDs can be stored on a 6 GB
drive (at 128 kb/s for a stereo signal). However, the personal computer is not a home
appliance that is highly compatible with a home music system (that fan noise! and
why is it taking so long to boot up?). An additional issue with this new model of
music distribution is that when reliability problems occur, you don’t lose just a song
or two from a scratch on the CD, but an entire library of 130 CDs in the event of a
hard drive crash.
However, with the advent of high-speed networking, the repository of compressed
music can be moved from the home to a secure and reliable server in the network.
This has several advantages. First, a network-based storage device could be made
very reliable at a modest cost per-user. Second, the music appliance in the home
would not have to be a computer (with associated mass storage device), but merely a
small, networked, audio decoder (no fan, and it runs the moment you turn it on!). A
home network link of more than 1 Mb/s downstream has sufficient capacity to deliver
a 128 kb/s compressed music bitstream to the home with reliable quality of service.
Assume that the consumer has a multi-media personal computer and a high-speed
network connection. When shopping for music on the Internet, there are already many
sites that will let the prospective buyer browse via artist, album title or song title;
show a discography of the album or artist; permit short clips of songs to be
downloaded for preview; and recommend related artists or albums that the customer
might also like. Considering the magnitude of the effort required to create such a
database, a standardized format that would enable the cost to be shared by many retail
outlets might be very welcome.
When a song is purchased, the consumer does not download it to local storage, but
rather moves it to a networked server. In fact, a purchase could consist of an entry in
a server database that indicated that a song, perhaps at another location in the network,
is “owned” by the consumer. This poses a dilemma: many consumers have large
music libraries, which they might (or might not) catalog and organize in various ways
by means of the physical arrangement of the CDs. Possibilities are alphabetical my
artist or genre, categorize by mood, or just frequency of play (LIFO stack). Without
the physical jewel boxes and liner notes to look at, these methods of organization are
not possible. An method to browse the electronic catalog via any of various means of
organization is vital, and this proposal is a first step toward satisfying that need.
Now assume that the music application is a subscription service: for a flat fee per
month, the consumer can play any music in the provider’s archive. This multiplies the
access problem enormously. The consumer can play not just the possibly thousands
of songs in his or her personal library of purchased music, but any of hundreds of
thousands of songs that are commercially available. An easy method of access is
vital. Furthermore, in such a service the consumer might typically compose song lists,
and then play the list during a listening session. An attractive new service would be to
MPEG-7 Applications Document
8
automatically compose an “infinite” song list based on initial information provided by
the consumer. For example, “play cool jazz songs that are relaxing, like the songs on
the album `Pariaso – Jazz Brazil’ by Jerry Mulligan.” This proposal attempts to
specify the underlying database of information to facilitate such a service.

Application requirements
Robust representations of melody and other musical features which allow for
reasonable errors on the part of the indexer in order to accommodate query-byhumming,

Associated information, and

Cross-modal search.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

There must exist mechanism that supports melody and other musical features.

There must exist a mechanism that supports descriptors based on information
associated with the data (e.g., textual data).

Support description schemes that contain descriptors of visual, audio, and/or other
features, and support links between the different media.
Application relevant work and references
The area of query-by-humming may be the most-researched field within auditory
query by content. A few example papers include:
Ghias, A., Logan, J., Chamberlain, D., Smith, B. C. (1995). “Query by hummingmusical information retrieval in an audio database,” ACM Multimedia ‘95 San
Francisco.
<http://www.cs.cornell.edu/Info/People/ghias/publications/query
-by-humming.html>
Kageyama, T., Mochizuki, K., Takashima, Y. (1993). “Melody retrieval with
humming,” ICMC ‘93 Tokyo proceedings, 349-351.
Lindsay, A. (1996). “Using Contour as a Mid-Level Representation of Melody,” S.M.
Thesis, MIT Media Laboratory, Cambridge, MA.
http://sound.media.mit.edu/~alindsay/thesis.html
Quackenbush, S. (1999). “A Description Scheme and Associated Descriptors for
Music” ISO IEC/JTC1/SC29/WG11 Doc m4582/p471, presented at MPEG-7
Evaluation meeting, Lancaster UK.
Wu, J. K. (1999). “Speech Annotation – A Descriptive Scheme for Various Media”
ISO IEC/JTC1/SC29/WG11 Doc m4582/p629, presented at MPEG-7
Evaluation meeting, Lancaster UK.
MPEG-7 Applications Document
9
4.4 Sound effects libraries
Application Description
Foley artists, sound designers, and the like must deal with extremely large databases
of sound effects to be used for a variety of applications daily. Existing database
management and search solutions are typically proprietary and therefore closed, or
open, and unsuitable for any serious, orderly work.
A sound designer may specify a sound effect type, for example, naming the source of
the sound, and select from variations on that sound. A designer may provide a
prototypical sound, and detail features such as, “bigger, more distant, but keeping the
same brightness.” One may even vocalise the type of abstract sound one seeks, in an
onomatopoetic variation of query-by-humming. Essential to the application is the
ability to navigate a space of similar sound effects.

Application requirements
Compact representation of sound effects,

Sound source name and characteristics, and

Ability to specify classes of audio-visual objects, with features to accommodate
selection.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for hierarchical descriptors and description schemes.

Support for text based descriptors.

Efficient coding of descriptors.
Application relevant work and references
Blum, T, et al, “Content-based classification, search, and retrieval of audio,” in
Intelligent multimedia information retrieval, Maybury, Mark T. (ed) (1997).
Menlo Park, Calif.
Mott, R.L. (1990) “Sound Effects: Radio, TV, and Film,” Focal Press, Boston, USA.
4.5 Historical speech database
Application Description
One may search for historical events through key words spoken (“We will bury you”),
key events (‘shoe banging’), the speaker (‘Nikita Krushchev’), location and/or context
(‘address to the United Nations’), date (12 October 1960), or a combination of any or
all of the above in order to call up an audio recording, an audio-visual presentation, or
any other associated facts. This application can aid in education (See also 6.4-Film
music education) or journalistic research.
MPEG-7 Applications Document
10

Application requirements
Representation of textual content of an auditory event.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for text based descriptors.
Application relevant work and references
[to come]
4.6 Movie scene retrieval by memorable auditory events
Application Description
In our post-modern world, many visual events are referred to by memorable spoken
words. This is no more evident than when referring to comedic movie or television
scenes (“This parrot is bleedin’ demised,” and “land shark,”) or movies by auteurs
(“I’m sorry, did I ruin your concentration?” “thirty-seven!?” and “there’s only trouble
and desire.”) by key words. One should be able to look up a movie (and rent a viewing
of a particular scene, for example) by quoting such catch phrases. It is not hard to
imagine a new market growing up around such micro-views and micro-payments,
based on impulse viewing.
In a similar vein, auditory events in soundtracks may be just as accessible as spoken
lines in certain circumstances. A key example is the screeching violins in the
“Psycho” soundtrack at the point of the infamous shower scene. Those repeated harsh
notes (“Scree-ee-ee-ee!”) are iconic to a movie-going public, and a key feature of an
important movie.

Application requirements
Search by example audio.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for audio descriptors.
Application relevant work and references
[to come]
4.7 Registration and retrieval of mark databases
Application Description
Registration of marks is to protect the inventor or service provider in the form of
exclusive rights of exploitation through legal proceedings from misuse or imitation. A
mark is a sign, or a combination of signs, capable of distinguishing the goods or
MPEG-7 Applications Document
11
services of one undertaking from those of other undertakings. In general, the sign may
be in the form of two-dimensional image that consists of text, drawings or pictures,
emblems including colors. Two-dimensional marks can be categorized into the
following three types as listed below:

Word-in mark
Contains only characters or words in the mark. (best described by text
annotation)

Device mark
Contains graphical or figurative elements only. (shape descriptor needed)

Composite mark
Consists of characters or words and graphical elements. (combination of above
descriptors)
If a mark is registered, then no person or enterprise other than its owner may use it for
goods or services identical with or similar to those for which the mark is registered.
Any unauthorized use of a sign similar to the protected mark is also prohibited, if such
use may lead to confusion in the minds of the public. The protection of a mark is
generally not limited in time, provided its registration is periodically renewed
(typically, every 10 years) and its use continues. Therefore, this number is expected to
keep growing rapidly, and it is estimated that the number of registrations and renewals
of marks effected worldwide in 1995 was in the order of millions.
In order to register a mark, one has to make sure no identical ones are registered
before. For the types of “Word-in mark” and “Composite-mark,” text annotation may
be adequate for the retrieval from the database. “Device-mark” type, however, is
characterized only by the shape of the object. In addition, this type may not have
distinct orientation or scale. When the operator enters a new mark to database for
registration, he/she wants to make sure that no identical one is already in the system in
disregard of its orientation angle or scale. Furthermore, he/she may want to see how
similar shaped ones are already in the system even if there is no identical one. The
search process should be robust to noise in image or minor variations in its shape. Any
relevant information such as annotation or textual description of the mark should also
be accessible if requested.
A mark designer may want the same thing. In addition, to avoid possible inadvertent
infringement of the copyright, the designer may wish to see whether some possible
variations of the mark under design are already registered.
In this respect, it is desirable for the system capable of returning the retrieved results
in terms of the similarity, and displaying the results simultaneously for comparison.
So far, the current practice to retrieve similar or the same device-type mark is
performed manually by human operator resulting in many duplicated registrations.
Therefore, there is an enormous potential need for an automatic retrieval of marks by
the contents-based similarity not only in the international community but also in
domicile. The submission of the mark to the system can be done interactively on-line
to refine the search-process on a web-based Internet paradigm [4].
MPEG-7 Applications Document
12
Application requirements
Specific requirements for those applications are:

Efficient interactive response time.

Support for a mechanism by which a mark image may be submitted for
similarity based retrieval.

Support for visual based descriptors by which modifications can be made to
any of the retrieved results for fine-tuning the search-process (relevance
feedback).
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:






Support for shape-based and content-based queries;
Support for precise, shape-oriented similarity queries;
Support for scale and orientation independence of marks;
Support for media summaries for fast browsing;
Support for linking relevant information
Support for D’s and DS’s invariant under transformations irrelevant to the
intended features
Application relevant work and references
Andrews, B. (1990). U. S. patent and trademark office ORBIT trademark retrieval
system. T-term user guide, examining attorney’s version, : October, 1990.
Cortelazzo, G., & Mian, G. A., & Vezzi, G., & Zamperoni, P. (1994). Trademark
shapes description by string-matching techniques. Pattern Recognition, 27(8),
1005-1018.
Eakins, J. P. (1994). Retrieval of trademark images by shape feature. Proc. of Int.
Conf. on Electronic Library and Visual Information Research, 101-109, May,
1994.
Eakins, J. P., & Shields, K., & Boardman, J. (1996). ARTISAN – a shape retrieval
system based on boundary family indexing. Proc. SPIE, Storage and Retrieval
for Image and Video Database IV, vol. 2670, 17-28, Feb. 1996.
Lam, C. P., & Wu, J. K., & Mehtre, B. (1995). STAR - a system for trademark
archival and retrieval. Proceedings 2nd Asian Conf. on Computer Vision, vol. 3,
214-217.
Kim, Y-S, & Kim, W-Y (1998). Content-Based Trademark Retrieval System Using
Visually Salient Feature, Journal of Image and Vision Computing, vol. 16/1213, August 1998.
WORLD INTELLECTUAL PROPERTY ORGANIZATION: (WIPO)
http://www.wipo.org/eng/dgtext.htm
5. “Push” Applications
In contrast with the above “pull” applications, the following “push” applications
follow a paradigm more akin to broadcasting, and the emerging webcasting. The
MPEG-7 Applications Document
13
paradigm moves from indexing and retrieval, as above, to selection and filtering. Such
applications have very distinct requirements, generally dealing with streamed
descriptions rather than static descriptions stored on databases.
5.1 User agent driven media selection and filtering
Application description
Filtering is essentially the converse of search. Search involves the pull of information,
while filtering implies information ‘push.’ Search requests the inclusion of
information, while filtering excludes data. Both pursuits benefit strongly from the
same sort of meta-information.
Broadcast media are unlikely to disappear any time soon. In fact, there is a movement
to make the World Wide Web, primarily a pull medium, more broadcast-like. If we
can enable users to select information more appropriate to their uses and desires from
a broadcast stream of 500 channels, using the same meta-information as that used in
search, then this is an application for MPEG-7.
This application gives rise to several sub-types, primarily divided among types of
users. A consumer-oriented selection gives rise to personalised audio-visual
programmes, for example. This can go much farther than typical video-on-demand in
collecting personally relevant news programmes, for example. A content-producer
oriented selection made on the segment or shot level is a way of collecting raw
material from archives.


Application requirements
Efficient interactive response times, and
The capability to characterise a media object by a set of concepts that may be
dependent on locality or language.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for descriptors and description schemes that allow multiple languages.

There must exist a mechanism by which concepts may be represented
Application relevant work and references
Lieberman, H. (1997) “Autonomous Interface Agent,” In proceedings of Conference
on Computers and Human Interface, CHI-97, Atlanta, Georgia.
Maes, P. (1994a) “Agents that Reduce Work and Information Overload,”
Communications of the ACM, vol. 37, no. 7, pp. 30 - 40.
Marx, M., & C. Schmandt (1996) “CLUES: Dynamic Personalized Message
Filtering,” In proceedings of Conference on Computer Supported Cooperative
Work /CSCW 96, edited by M. S. Ackermann, pp. 113 - 121, Hyatt Regency
Hotel, Cambridge, Mass.: ACM.
MPEG-7 Applications Document
14
Nack, F. (1997) Considering the Application of Agent Technology for Collaboration
in Media-Networked Environments. IRIS20,
Shardanand, U., & P. Maes (1995) “Social Information Filtering: Algorithms for
Automating 'Word of Mouth',” In proceedings of CHI-95 Conference, Denver,
CO: ACM Press.
See also Annex A.
5.2 Personalised Television Services
Application Description
In the broadcast area, the MPEG-7 description can provide the user with assistance in
selection of broadcast data, be it for immediate or later viewing, or for recording. In a
personalized broadcast scenario, the data offered to the user can be filtered from
broadcast streams according to his own profile, the generation of which may be done
automatically (e.g. based on location, age, gender or on the previous selection
behavior) or semi-automatically (e.g. based on pre-set interests). The broadcast of
MPEG-7 description streams will enable providers of Electronic Programme Guides
(EPGs) with a variety of capabilities, wherein presentation of MPEG-7 data (also
along with the original AV data) will also be an important aspect. In combination with
NVOD (Near-Video on Demand) services and recording, new functionalities like
stepping forward/backward based on keyframe selection and changes in the sequel of
scenes for speed-up in presentation are possible. Extended interactivity functionalities,
related to specific events in the programmes, are of importance for future broadcast
services as well. This can include "offline" interactivity based on a recorded broadcast
stream, which can require identification of the associated event during callback. It can
be expected that MPEG-7 data will be transmitted along with the AV data streams, or
(e.g. for an EPG channel) also as separate streams [5].

Application requirements
Description of broadcast media objects in terms of, for example, content type,
author, cast, parental rating, textual description, temporal relationship, localityand language-dependent features, service provider, and IP protection.

Description of specific events in broadcast media objects

Specification of interactivity capability related to specific events (e.g. polling) and
definition of associated interaction channels (e.g. telephone numbers or
hyperlinks)

Presentation of content-related data, also along with the associated media objects,
and manipulation of the presentation based on the content description (e.g. by
event status)

Support for APIs that define receiver-side filter function, interaction capability or
definition of extended content description

Support for unique presentation of the media objects and the associated content
description, controllable by user interaction and description parameters
MPEG-7 Applications Document
15

The broadcast metadata to be put into MPEG-7 streams must be flexible, the
broadcasters must be able to "spice" the content based on available resources. As
little information as possible should be mandatory, such that broadcasters can start
the service without a huge investment. Upwards compatibility with existing
standards like DVB-SI, ATSC PSIP or metadata definitions originating from the
EBU/SMPTE task forces also falls under this aspect.

Capability to set up content- or service-related links between different broadcast
media objects (not necessarily transmitted simultaneously)
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for descriptors and description schemes that allow definition of broadcast
media objects and events as listed above in their temporal (absolute and relative
time bases, duration of validity) and physical (channel, stream) context

Support for streaming of MPEG-7 data

Support for interactivity, including identification of the related event during a
callback procedure
Application relevant work and references
[to come]
5.3 Intelligent multimedia presentation
Application Description
Given the vast and increasing amount of information available, people are seeking
new ways of automating and streamlining presentation of that data. That may be
accomplished by a system that combines knowledge about the context, user,
application, and design principles with knowledge about the information to be
displayed. Through clever application of that knowledge, one has an intelligent
multimedia presentation system.

Application requirements
The ability to provide contextual and domain knowledge, and

The ability to represent events and the temporal relationships between events.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

There must exist a mechanism by which contextual information may be encoded.

There must exist a mechanism by which temporal relationships may be
represented
MPEG-7 Applications Document
16
Application relevant work and references
André, E., & Rist, T. (1995). Generating Coherent Presentations Employing Textual
and Visual Material. Artificial Intelligence Review, Special Volume on the
Integration of Natural Language and Vision Processing, 9(2 - 3), 147 - 165.
Bordegoni, M., et al, “A Standard Reference Model for intelligent Multimedia
Presentation Systems,” April 1997, pre-print.
<http://www.dfki.uni-sb.de/~rist/csi97/csi97.html>
Davenport, G., & Murtaugh, M. (1995). ConText: Towards the Evolving
Documentary. In ACM Multimedia 95 - Electronic Proceedings. San
Francisco, California: November 5-9, 1995.
http://ic.www.media.edu/icPeople/murtaugh/acm-context/acmcontext.html
Feiner, S. K., & McKeown, K. R. (1991). Automating the Generation of Coordinated
Multimedia Explanations. IEEE Computer, 24(10), 33 - 41.
Maybury, M. T. (ed.) (1993) Intelligent Multimedia Interfaces. AAAI Press/ MIT
Press, Cambridge, MA.
Maybury, Mark T. (ed) (1997) Intelligent multimedia information retrieval. Menlo
Park, Calif.
See also Annex A.
5.4 Personalizable Browsing, Filtering and Search for Consumers
Application Description
Imagine coming home from work late Friday evening, happy that the week is over.
You want to catch up with the world and then watch the ABC’s 20/20 show later that
night. It is now 9 PM and the 20/20 show will start in an hour at 10PM. You are
interested in the sports events of the week, and all the news about the local elections.
Your smart appliance has been selecting and recording TV broadcast programs
according to your general preferences all week long. It has a large amount of content,
definitely much longer than the hour you only have. You start interacting with the
appliance. You first identify yourself to the appliance by speaking your name. The
appliance uses voice verification to recognize you and then invokes your user profile
(which is indeed different than your spouse’s and children) to customize your viewing
experience and anticipate your needs.
You speak to your appliance “Show me the sports programs you have recorded”. At
that moment you are interacting with the browser of the system. On your large LCD
display, you see listed the following: Basketball and Soccer. Apparently, your favorite
football team’s game was not broadcast that week. You are interested in the basketball
games and express your desire to the appliance. On the screen there appears a Title
Frame and Title Text for each one of the games recorded. The Title Frame captures an
important moment of the game. The Title Text is a very brief summary, containing the
score, similar to those that can be found, for example, on the CNN-SI web site. You
MPEG-7 Applications Document
17
are interested in the Chicago-Bulls game, you like to spend not more than 15 minutes
on sports and you choose the 5-minute highlights option. You could have also wanted
to watch only the clips that contain “slam dunks”. The system has also captured
information from the web about this particular game. Your browser supports tools to
enhance the audiovisual presentation by other types of material such as textual
information. Five minutes is now up. You have seen the most worthwhile moments of
the game. You are impressed and you decide to archive the game. The system will
record the entire program on a long-term storage media. Meanwhile you are ready to
watch the news about local elections..
It is now 9:50 PM, and you are done with the news. In fact, you have chosen to delete
all the recorded news items after you have seen them. You remember to do one last
thing before 10PM that night. The next day, you want to watch the digital camcorder
tape that you have received from your brother that day, containing footage about his
new baby girl and his vacation in Spain last summer. You want to watch the whole 2hour tape but you are anxious to see what the baby looks like as well as the new
aquarium they built in Barcelona (which was not there when you visited Spain 10
years ago). You plan to take a quick look at a visual summary of the tape, browse, and
perhaps watch a few segments for a couple of minutes. Your camcorder is already
connected to your smart appliance. You put the tape in and start browsing its contents
using your smart appliance. You quickly discover the baby’s looks by finding and
playing back the right segments. Thanks to standard (MPEG-7) means of describing
information audiovisual information, your browser has no difficulty browsing the
contents of your brother’s tape. It is now 10:10 PM, it seems that you are 10 min late
for the 20/20. But, fortunately, your appliance has been recording 20/20 since 10 PM
since it knows that you watch or record it every week. You start watching the recorded
program, as the recording proceeds.

Application Requirements
There should be support descriptors and description schemes which can express
the interaction between a user and multimedia material, such as the user's
preferences, and other user-specific information.
MPEG-7 Specific Requirements
There are no additional requirements specific to MPEG-7 at this time.
Application relevant work and references
P. van Beek, R. Qian, I. Sezan, “MPEG-7 Requirements for Description of Users,”
ISO IEC/JTC1/SC29/WG11 Doc. M4601, MPEG Seoul.
5.5 Information access facilities for people with special needs
Application description
In our increasingly information dependent society we have to facilitate accessibility to
information to every individual user. However, some people face serious accessibility
problems to information, not because they lack the economic or technical basis but
rather because they suffer from one or several disabilities, e.g. visual, auditory, motor,
MPEG-7 Applications Document
18
or cognitive disabilities. Providing active information representations might help to
overcome the problems. The key issue is to allow multi-modal communication to
present information optimised for the abilities of individual users.
Thus, it is important to develop technical aids to facilitate communication for people
with special needs. For example, a search agent that does not exclude images as
information resource for the blind but rather makes available the MPEG-7 meta-data.
Aided by that meta-data, sonification (auditory display), or haptic display is made
possible. Similarity of meta-data helps to provide a set of information in different
modalities, in case the particular information is not accessible for the user.
Such applications provide full participation in society by removing communication
and information access barriers that restrict interactions between people with and
without disabilities, and they will lead to improved global commerce opportunities.
Application requirements
[no new ones apparent]
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support description schemes that contain descriptors of visual, audio, and/or other
features .
Application relevant work and references
The Yuri Rubinsky Insight Foundation:
http://www.yuri.org/webable/library.html#guidlinesandstandards
The Centre of Cognitive Science:
http://www.cogsci.ed.ac.uk/
The Human Communication Research Centre:
http://www.hcrc.ed.ac.uk/
6. Specialised Professional and Control Applications
The following potential MPEG-7 applications do not limit themselves to traditional,
media-oriented, multimedia content‘, but are functional within the meta-content
representation to be developed under MPEG-7. They reach into such diverse, but dataintensive, domains as medicine and remote sensing. Such applications can only serve
to increase the usefulness and reach of this proposed international standard.
6.1 Teleshopping
Application description
More and more merchandising is being conducted through catalogue sales. Such
catalogues are rarely effective if they are restricted to text. The customer who browses
MPEG-7 Applications Document
19
such a catalogue is more likely to retain visual memories than text memories, and the
catalogue is frequently designed to cultivate those memories. However, given the
sheer size of many of these catalogues, and the fact that most people only have a
vague idea of what they want ("I'll know it when I see it"), they will only be effective
if it is possible to find items by successively refining and/or redirecting the search.
Typically, the customer will spot something that is almost right, but not quite. He or
she will then want to fine-tune the search-process by interacting with the system. E.g.
“I'm looking for brown shoes, a bit like those over there, but with a slightly higher
heel,” or “I'm looking for curtains with that sort of pattern, but in a more vivid
colour.”
Catalogues of items for which stock maintenance is difficult or expensive but for
which the search process is essentially visual (e.g. garden design, architecture, interior
decorating, oriental carpets) are especially aided by this application. For such items,
detailed digital image-databases could be supported and updated centrally and
accessed from distributed selling points.

Application requirements
Support for interactive queries with few predicted constraints,

Support for precise, product-oriented similarity queries, and

MPEG-7 should operate “as fast as possible,” allowing efficient interactive
response times.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for visual based descriptors.
Application relevant work and references
André, E., J. Mueller, & T. Rist (1997) “Adding Animated Presentation Agents to the
Interface,” To appear in the proceedings of IJCAI 97 - Workshop on Animated
Interface Agents: Making them intelligent, Nagoya, Japan.
Chavez, A., & P. Maes (1996) “Kasbah: An Agent Marketplace for Buying and
Selling Goods,” In proceedings of 1. International Conference on the Practical
Application of Intelligent Agents and Multi-Agent Technology, London, UK.
Rossetto, L., & O. Morton (1997) “Push!” Wired, no. 3.03 UK, March 97 , pp. 69 81.
At the moment there are very few catalogues that provide content-based image
retrieval for teleshopping and the ones that do, understandably focus on applications
where images of merchandise are relatively homogeneous and standardised, such as
wallpaper, flooring, tiles, etc. A searchable online database of full-colour flooring
samples including carpet and vinyl products can be found at
<http://www.floorspecs.com>. They promise a colour match system in the near
future.
MPEG-7 Applications Document
20
6.2 Bio-medical applications
Application description
Medicine is an area in which visual recognition is often a significant technique for
diagnosis. The medical literature abounds with atlases, volumes of photographs that
depict normal and pathological conditions in different parts of the body, viewed at
different scales. An effective diagnosis may often require the ability to recall that a
given condition resembles an image in one of the atlases. The amount of material
catalogued in such atlases is already large and continues to grow. Furthermore, it is
often very difficult to index using textual descriptions only. Therefore, there is a
growing demand for search-engines that can respond to image-driven queries. This
will allow physicians to access image-based information in a way that is similar to the
current keyword-based search-engines such as MEDLINE. (E.g. in order to make a
differential diagnosis, a radiologist might want to compare medical records and casehistories of all patients in a medical database for which radiographs showed similar
lesions or pathologies. Furthermore, as 3D-imaging techniques keep gaining
importance, such image-driven queries will have to be able to handle both 2- and 3dimensional data. Cross-modal search will apply when one includes associated
clinical auditory descriptions, such as associating a cough with a chest x-ray in order
to aid diagnosis.
Biochemical interactions crucially depend on the three-dimensional structure of the
participating modules (e.g. the shape-complementarity between signal-molecules and
cell-receptors, or the key-in-lock concepts for immunological recognition). Thanks to
the sustained effort of a large number of biomedical laboratories, the list of molecules
for which the chemical composition and spatial structure are documented us growing
continuously. Given the fact that it is still extremely difficult to predict the structure of
a biomolecule on the basis of its primary structure (i.e. the string of constituent
atoms), searching these databases will only be helpful if it can be done on the basis of
shape. It is not difficult to make the leap from the ability to search on 3-D models, as
proposed by MPEG-7. Such applications would be extremely helpful in drug-design,
as one could e.g. search for biomolecules with shapes similar to a candidate-drug to
get an idea of possible side effects.

Application requirements
The ability to link libraries of correlated and relevant information (e.g. images,
patient history, clinical findings, medication regime),

The ability to perform on-line annotations and mark regions-of-interest with any
shape or form of connectivity (from small and compact to large and diffuse),

The ability to search images that contain similar regions-of-interest, ignoring the
visual characteristics of the rest of the image,

The ability to handle both n -dimensional data (and their time-evolution).
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for n-dimensional data descriptors.
MPEG-7 Applications Document
21

Support for descriptors containing temporal information.

There must exist a mechanism by which a specific segmentation or hierarchical
level may be chosen, to the exclusion of all other segments.
Application relevant work and references
One development that might be relevant is the Picture Archiving and Communication
System (PACS) which allows physicians to annotate images with text, references and
questions. This greatly facilitates interaction and consultation with colleagues and
specialists.
Furthermore, there is the STARE-project (STructured Analysis of the REtina) at the
University of California, San Diego, which is an information system for the storage
and content-based retrieval of ocular fundus images.
<http://oni.ucsd.edu/stare>
6.3 Universal Access
Application Description
New classes of pervasive computing devices such as personal digital assistants
(PDAs),hand-held computers (HHC), smart phones, automotive computing devices,
and wearable computers allow users more ubiquitous access to information than ever.
As users are beginning to rely more heavily on pervasive computing devices, there is a
growing need for applications to bring multimedia information to the devices. The
basic idea of Universal Multimedia Access is to enable client devices with limited
communication, processing, storage and display capabilities to access rich multimedia
content.
Figure 1: Adaptation of multimedia content to pervasive computing devices.
Recently, several solutions focussed on adapting the multimedia content to the client
devices as illustrated in Figure 1. Universal Multimedia Access can be provided in
two basic ways -- the first is by storing, managing, selecting, and delivering different
versions of the media objects (images, video, audio, graphics and text) that comprise
the multimedia presentations. The second is by manipulating the media objects onMPEG-7 Applications Document
22
the-fly, such as by using methods for text-to-speech translation, image and video
transcoding, media conversion and summarization. This allows the multimedia
content delivery to adapt to the wide diversity of client device capabilities in
communication, processing, storage, and display.
Application Requirements
In order for the Application to work, there must exist some mechanisms for the
following:

Description of resource requirements of multimedia material, such as data size,
screen size requirements, minimum streaming bandwidth requirements.

Management and selection of different versions of multimedia material to adapt to
client device capabilities, network infrastructure, and user preferences.

Description of the capabilities of client devices such as the display size, screen
color depth, audio, video display, storage space, processing power, network access
bandwidth, and so forth.

Methods for manipulating, transcoding, and summarizing multimedia material.

Maintenance of the synchronization and layout of transcoded multimedia material.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

There should be some mechanism for media object resource requirements.

There should be some mechanism for the synchronization needs of multimedia
presentations.

There should be some mechanism for transcoding hints that allow content authors
to guide the processes of manipulating and adapting multimedia material.

There must be mechanisms that allow the direct manipulation of multimedia
material.
Application relevant work and references
T. Ebrahimi and C. Christopoulos, “Can MPEG-7 be used beyond database
applications?”, ISO/IEC JTC1/SC29/WG11 MPEG98/MP3861, October 1998,
Atlantic City, USA
Charilaos Christopoulos, Touradj Ebrahimi, V.V. Vinod, John R. Smith,Rakesh
Mohan, and Chung-Sheng Li, "MPEG-7 application: Universal Access
Through Content Repurposing and Media Conversion", ISO/IEC
JTC1/SC29/WG11 MPEG99/MP4433, March 1999, Seoul, Korea.
MPEG-7 Applications Document
23
L. Bergman et al, “Integrated DS for Multimedia Content”, ISO/IEC
JTC1/SC29/WG11 MPEG99/MP473, February 1999, Lancaster, UK
J.R. Smith, C.-S. Li, “P473: InfoPyramid Description Scheme for Multimedia
Content”, ISO/IEC JTC1/SC29/WG11 MPEG99/MP473 (slides), February
1999, Lancaster, UK.
6.4 Remote Sensing Applications
Application description
In remote sensing applications, the requirements of satellite image databases, namely
several millions of images acquired according to various modalities (panchromatic,
multispectral, hyperspectral, and hexagonal sampling, etc.), the diversity of potential
users (scientists, military, geologists, etc.), and improvements in telecommunication
techniques make it necessary to define a highly efficient description standard. Until
now, information search in image libraries is based on textual information such as
scene name, geographic, spectral, and temporal information. Based on this,
information exchange is achieved by means of tapes and photographs.
A challenging aspect is to provide capabilities of exploiting such complex databases
from on-line systems supporting the following functionalities:


textual query,
image query based on either whole or part of a reference image (one or several
spectral bands),
 content-based retrieval,
 browsing, and
 confidentiality and data protection.
MPEG-7 should be an appropriate framework for solving such requests.
Application requirements





support various description schemes,
support for different data (e.g. multispectral, hyperspectral, and SAR)
associated with various sensors (ground resolution, wavelength),
ability to include multiple descriptions of the same documents,
ability to link correlated information, similar region of interest, and
ability to take into account time evolution for 2D and 3D data.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:




Support for multiple descriptors and description schemes for the same data.
Support for descriptors for unique data types.
Support for descriptors embodying temporal change.
There must be a mechanism by which descriptors and description schemes within
MPEG-7 Applications Document
24
a description may be linked to other D’s and DS’s within the same description.
Application relevant work and references
[to come]
Semi-automated multimedia editing
Application Description
Given sufficient information about its contents, what could a multimedia object do?
With sufficient information about its own structure combined with methods on how to
manipulate that structure, a ‘smart’ multimedia clip could start to edit itself in a
manner appropriate to its neighbouring multimedia. For example, a piece of music and
a video clip, from different sources, could be combined in a way such that the music
stretches and contracts to synchronise with specific ‘hit’ points in the video, and thus
create an appropriate and customised soundtrack.
This could be a new paradigm for multimedia, adding a ‘method’ layer on top of
MPEG-7’s ‘representation’ layer. By making multimedia ‘aware,’ to an extent, one
opens access to beginning users and increases productivity for experts. Such hidden
intelligence on the part of the data itself shifts multimedia editing from direct
manipulation to loose management of data.
Semi-automated multimedia editing is a broad category of applications. It can
facilitate video editing for home users as well as experts in studios through varying
amounts of guidance or assistance through the process. In its simpler version, assisted
editing can consist of an MPEG-7-enabled browser for the selection of video shots,
using a suitable shot description language. In an intermediate version, assisted editing
can include planning, i.e. proposing shot selections and edit points, satisfying a
scenario expressed in a sequence description language.

Application requirements
Pointers as ‘handles’ that refer to the data directly, to allow manipulation of the
multimedia.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Ability to link descriptors to the data that they describe.
Application relevant work and references
Bloch, G.R (1986) Elements d’une Machine de Montage Pour l’Audio-Visuel. Ph.D.,
Ecole Nationale Superieure Des Telecommunications.
Nack, F. (1996) AUTEUR: The Application of Video Semantics and Theme
Representation for Automated Video Editing. Ph.D. Thesis, Lancaster
University.
Parkes, A.P. (1989) An Artificial Intelligence Approach to the Conceptual Description
MPEG-7 Applications Document
25
of Videodisc Images. Ph.D. Thesis, Lancaster University.
Sack, W., & Davis, M. (1994). IDIC: Assembling Video Sequences from Story Plans
and Content Annotations. IEEE International Conference on Multimedia
Computing Systems, Boston, MA: May 14 - 19, 1994.
Sack, W., & Don, A. (1993) Splicer: An Intelligent Video Editor (Unpublished
Working Paper, MIT).
6.5 Educational applications
Application description
The challenge of using multimedia in educational software is to make as much use of
the intrinsic information as possible to support different pedagogical approaches such
as summarisation, question answering, or detection of and reaction to
misunderstanding or non-understanding.
By providing direct access to short video sequences within a large database, MPEG-7
can promote the use of audio, video and film archive material in higher education in
many areas:
History: Radio, television and film provide detailed accounts of many contemporary
events, useful for class-room presentations, provided that a sufficiently precise
(MPEG-7) description can be queried based on dates, places, personalities, etc. (see
also 4.5-Historical Speech Databases)
Performing arts (music, theatre): Fine-grained, standardised descriptions can be
used to bring a selection of relevant documents into the classroom for special classes,
using on line video archives as opposed to costly local tape libraries. For instance,
several productions of a theatrical scene, or musical work, can thus be consulted for
comparison and illustration. Because classic and contemporary theatre are widely
available in translation, this application can target worldwide audiences.
Film Music: A tool can be developed for improving the knowledge and skills of users
in the domain of film theory/practice and film music (music for film genres).
Depending on the user’s background the system should provide enough material to
not only improve the user’s ability in understanding the complexity of each single
medium but also to handle the complex relationships between the two media film and
music. To achieve this, the system should offer an environment in which the student
can perform guided/supported experiments, e.g. on editing film, mixing sound, or
combining both, which requires that the system can analyse and criticise the results
achieved by the user.
Thus, this system must be able to automatically generate film/sound sequences and
their synchronisation based on stereotypical music/film pattern for film genres, and
perhaps ways to creatively break the established generating rules.

Application requirements
Linking mechanisms to synchronise between MPEG-7 descriptors and other
sources of information (e.g.HTML, SGML, World Wide Web services, etc.)
MPEG-7 Applications Document
26

Mechanisms for allowing specialised vocabularies.

The ability to allow real time operation in conjunction with a database.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for interoperation with description schemes.

Support for descriptors and description schemes that allow specialised languages
or vocabularies.

There must exist a mechanism by which descriptions may link to external
information.
Application relevant work and references
Margaret Boden (1991) The Creative Mind: Myths and Mechanisms, Basic Books,
New York
Schank, R. C. (1994). Active Learning through Multimedia. IEEE MultiMedia, 1(1),
69 - 78.
Sharp, D., Kinzer, C., Risko, V. & the Cognition and Technology Group at Vanderbilt
University. (1994). The Young Children's Video Project: Video and software
tools for accelerating literacy in at-risk children. Paper presented at the
National Reading Conference, San Diego, CA
http://www.edc.org/FSC/NCIP/ASL_VidSoft.html
Tagg, Philip (1980, ed.). Film Music, Mood Music and Popular Music Research.
Interviews, Conversations, entretiens. 1980, SPGUMD 8002
Tagg, Philip (1987). Musicology and the Semiotics of Popular Music. Semiotica, 661/3: 279-298. (This and other texts accessible on-line via
http://www.liv.ac.uk/ipm/tagg/taggwbtx.htm
See also the reference section of 6.3-Semi-automated multimedia editing and Annex A.
6.6 Surveillance applications
Application description
There are a number of surveillance applications, in which a camera monitors sensitive
areas and where the system must trigger an action if some event occurs. The system
may build its database from no information or limited information, and accumulate a
video database and meta-data as time elapses. Meta-content extraction (at an
“encoder” site) and meta-data exploitation (at a “decoder” site) should exploit the
same database.
As time elapses and the database is sufficiently large, the system, at both sides, should
have the ability to support operations on the database, such as:
MPEG-7 Applications Document
27
Search on the audio/video database for a specific event (synthetic or current data).
Event is a sequence of audio/video data.
Find similar events in the past.
Make decisions on the current data related to the accumulated database, and/or to apriori known data.
A related application is in security and forensics, in the matching of faces or
fingerprints.

Application requirements
Real time operation in conjunction with a database, and

Domain-specific features.
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for descriptors and description schemes that allow specialised languages
or vocabularies.

Support for descriptors for unique data types.
Application relevant work and references
Courtney, J. D. (1997). Automatic Video Indexing by Object Motion Analysis. Pattern
Recognition, vol. 30, no. 4, 607-626.
6.7 Visually-based control
Application description
In the field of control, there have been several developments in the area of visually
based control. Instead of using text-based approaches for control programming,
images, visual objects, and image sequences are used to specify the control behaviour
and are an integral part of the control loop (e.g. visual servoing).
One aspect in the description of control information between (video) objects is that
objects are not necessarily associated via temporal spatial relationships. Accumulation
of the control and video information allows visually based functions such as redo,
undo, search-by-task, or object relationship changes [6].

Application requirements
Description of relationships between arbitrary object nodes in addition to spatiotemporal relationships, as in e.g. BIFS

Allow searches based on the arbitrary (control) associations.

MPEG-7 should operate “as fast as possible,” allowing efficient interactive
response times.
MPEG-7 Applications Document
28
MPEG-7 Specific Requirements
Requirements specific to MPEG-7 are:

Support for descriptors and description schemes containing spatio-temporal
relationships.

Support for descriptors containing relationships between arbitrary objects.
Application relevant work and references
Palm, S.R., Mori, T., Sato, T. (1998) Bilateral Behavior Media: Visually Based
Teleoperation Control with Accumulation and Support, Submitted to Robotics
& Automation Magazine special issue on Visual Servoing.
7. Applications Making Use of Specific Descriptors & Description
Schemes
The applications described in this section have been added after the Committee Draft
of the MPEG-7 standard has been issued. They represent target applications based on
some of the specific descriptors and description schemes that have been adopted by
the standard.
7.1 Transcoding Applications
Introduction
The amount of audiovisual (AV-) content that is transmitted over optical, wireless and
wired networks is growing. Furthermore, within each of these networks, there are a
variety of AV-terminals that support different formats. This scenario is often referred
to as Universal Multimedia Access (UMA), which has been described in section 6.3.
The involved networks are often characterized by different network bandwidth
constraints, and the terminals themselves vary in display capabilities, processing
power and memory capacity. Therefore, it is required to represent and deliver the
content according to the current network and terminal characteristics. For example,
audiovisual content stored in a compressed format such as MPEG-1, 2 or 4 has to be
converted between different bit-rates and frame-rates, but must also account for
different screen sizes, decoding complexity and memory constraints of the various
AV-terminals.
To avoid storing the content in different compressed representations for different
network bandwidths and different AV-terminals, low-delay and real-time transcoding
from one format to another is required. Additionally, it is critical that subjective
quality be preserved. To better meet these requirements, Transcoding Hints have been
defined by the MPEG-7 standard. The remainder of this document describes some
specific application scenarios that could benefit by MPEG-7 transcoding hints.
Additionally, we provide some input regarding the device capabilities.
In the following, several distinct application scenarios that could benefit by MPEG-7
transcoding hints are discussed. The transcoding hints that we refer to can be found in
section 8 of ISO/IEC CD 15938-5 (Multimedia Description Schemes). Each of these
MPEG-7 Applications Document
29
applications differ in the way the transcoding hints may be extracted and their usage.
However, all follow the general framework illustrated in Fig. 2. For each of the
applications considered, we highlight differences in the flow of information, including
the extraction and use of the transcoding hint, as well as the corresponding devices
that generate and consume the description.
MPEG-7 consuming device
From database,
network or
local device
Content +
Transcoding
Hints
Transcoder – bits to bits
Encoder – video to bits
Decoder – bits to video
Output
Meta-data
Transcoding
Hint Extraction
Content
Network/Terminal
Characteristics
Figure 2: Generic Illustration of Transcoding Hints Application Scenario.
Server/Gateway Application
Probably the most intuitive application and is also extensible to network nodes, such
as a base station or gateway. In the case of content being stored on a server, the
extraction is performed at the server, where real-time constraints may be relaxed. In
the case of content being transmitted through the network, extraction has already been
performed and is encapsulated with the content. In both cases, transcoding should be
performed in real-time. There is no complexity added in the consuming device (e.g.,
mobile phone). In addition, since the transcoding hints have been generated offline,
the server/gateway can implement the transcoding fast and efficient, therefore
minimizing the amount of delays.
As a variation of this application scenario, Fig 3 illustrates a situation with two
servers, a Broadcast/Internet server and a Home server. The major differences with the
above is that (i) there are two servers and (ii) the real-time transcoding constraints
may be relaxed. In the example shown, the Broadcast/Internet server distributes
content, such as MPEG-2 encoded video, with associated transcoding hints to a local
home-server. The home-server enables a wide range of interconnections and may be
connected to the client through a wireless link (e.g. by Bluetooth), by cable link (e.g.
IEEE 1394, USB), by portable media-like flash memory or several kinds of
magneto/optical disks.
MPEG-7 Applications Document
30
Content Provision and Transcoding
Metadata extraction
Internet / Broadcast
Server
Content
Metadata
Home Server
Content
Client
Transcoding
(non-realtime & realtime)
realtime)
Consumption
Figure 2: Two-Server Application Scenario
Client Upload/Content Sharing Application
This application assumes that image/video content is captured locally (e.g., on a
mobile device or remote broadcast camera) in one format and the user would like to
upload the content (e.g., to a home/central server) in another format. This application
may tolerate delay, but nevertheless should be carried out with good efficiency and
preserve quality. The media conversion may consider conversion between
compression formats, such as DV to MPEG-1, or within the same compression format
but at a lower frame-rate, bit-rate or picture resolution. The major objective is content
sharing, either for live viewing or interoperability with other devices. For this
application, the extraction of transcoding hints and the actual transcoding is performed
at the client. In another application scenario, the client might have to send this content
to other clients (in a Multimedia Messaging scenario, for example). Since the terminal
capabilities are different, the transcoding hints will be useful in order to speed up the
transcoding in the gateway.
Mobile Video Camera
The illustration of Fig. 4 considers the case of a mobile video camera that is capable
of extracting the transcoding hints. As many video cameras have the possibility to
save the video sequence (or parts of the video sequence) in different representations,
the transcoding hints could be very useful. Many video cameras also have the
possibility to transmit the captured or stored video by cable (IEEE 1394, USB, etc.) or
remote link (Bluetooth, etc.) to the home-video server or other devices, which perform
a transcoding by themselves employing the generated transcoding hints.
A related application is a mobile live video camera with built-in transcoding hints
extraction, providing video streaming over IP based networks. The content associated
transcoding hints can help the transcoder in a gateway/network server to adapt to
different target bit-rates.
MPEG-7 Applications Document
31
Home-Server
Content
Video Camera
with Transcoding
Metadata extraction
Display
Transcoding
Metadata
Transfer to Home-Server:
 Flash Memory Card,
 Bluetooth,
 IEEE 1394,
 USB,
 …
Figure 4: Mobile Video Camera Scenario
Mobile Video Phone
In this application scenario, Fig. 4, video with associated transcoding hints are passed
through or transcoded by video enabled mobile phones. For example, in this
application a user downloads a short video-clip of a baseball highlight scene with
associated transcoding hints from the Internet using a video enabled mobile phone.
Later, the user wants to share the baseball video clip with his/her friends at the same
bit-rate or at reduced bit-rate and quality.
In the first case, the videophone only has to pass the video clip and the transcoding
hints to the receiver. The receiving user still may need the transcoding hints for
redistribution or archiving of the video clip.
In the later case, the mobile videophone has to convert the video clip between the
different formats of the sender and receiver using transcoding hints. Some of the
receivers may only be interested in still images of the baseball game, which may be
generated using importance and/or mosaic metadata. Finally, some of the receivers of
the baseball video clip (or baseball still images) want to store these on their homeserver in the same format as all the other baseball images or video content already
being there. For this purpose, the user again needs the transcoding hints.
MPEG-7 Applications Document
32
Mobile Network
Transcoding
Metadata
Home-Server
Content Content
Mobile Network
Video
phone
Display
Transcoding
Metadata
Video-Clip/Still Image transfer to Home-Server:
 Flash Memory Card,
 Bluetooth,
 IEEE 1394,
 USB,
 …
Figure 5: Mobile Video Phone Scenario
Scalable Content Playback Application
In this application, content is being received at the client, but the terminal does not
have the resources to decode and/or display the full bitstream in terms of memory,
screen resolution or processing power. Therefore, the transcoding hints may be used to
adaptively decode the bitstream, e.g., by dropping frames/objects or decoding only the
important parts of a scene, as specified by the received transcoding hints.
Specific Usage of Transcoding Hints
The Transcoding Hints can be used for complexity reduction as well as for quality
improvement in the transcoding process. Among the transcoding hints that we
describe are specifically the Difficulty Hint, the Shape Hint and the Motion Hints. The
Difficulty Hint describes the bit-rate coding complexity. This hint can be used for
improved bit rate control and bit rate conversion from constant bit-rate (CBR) to
variable bit-rate (VBR). The Shape Hint specifies the amount of change in a shape
boundary over time and is proposed to overcome the composition problem when
encoding or transcoding multiple video objects with different frame-rates. The Motion
Hints describe i) the motion range, ii) the motion uncompensability and iii) the motion
intensity. This meta-data can be used for a number of tasks including, anchor frame
selection, encoding mode decisions, frame-rate and bit-rate control, as well as bit-rate
allocation among several video objects for object-based MPEG-4 transcoding. These
transcoding hints, especially the search range hints, aim also on computational
complexity reduction of the transcoding process. Further details can be found in the
MDS XM document, as well as the references cited below.
References
Kuhn, P. (2000) Camera Motion Estimation using feature points in MPEG
compressed domain, Proc. IEEE Int’l Conf. on Image Processing, Vancouver,
CA.
MPEG-7 Applications Document
33
Kuhn, P. and Suzuki T. (2001) MPEG-7 Meta-data for Video Transcoding: Motion
and Difficulty Hints, Proc. Storage and Retrieval for Multimedia Databases, SPIE
4315, San Jose, CA.
Vetro, A., Sun H. and Wang Y. (2001) Object-Based Transcoding for Adaptable
Content Delivery, IEEE Trans. Circuits and Systems for Video Technology, to
appear March 2001.
8. References
[1] MPEG Requirements Group, “MPEG-7 Context, Objectives, and Technical
Roadmap”, Doc. ISO/MPEG N2861, MPEG Vancouver Meeting, July 1999.
[2] MPEG Requirements Group, “MPEG-7 Requirements”, Doc. ISO/MPEG
N2859, MPEG Vancouver Meeting, July 1999.
[3] Rémi Ronfard, “MPEG-7 Applications in Radio, Film, and TV archives,” Doc.
ISO/MPEG M2791, MPEG Fribourg Meeting, October 1997.
[4] Whoi-Yul Kim et al, “MPEG-7 Applications Document,” Doc. ISO/MPEG
M3955, MPEG Atlantic City Meeting, October 1997.
[5] Jens-Rainer Ohm et al, “Broadcast Application and Requirements for MPEG7,” Doc. ISO/MPEG M4107, MPEG Atlantic City Meeting, October 1997.
[6] Stephen Palm, “Visually Based Control: another Application for MPEG-7,”
Doc. ISO/MPEG M3399, MPEG Tokyo Meeting, March 1998.
MPEG-7 Applications Document
34
Annex A: Supplementary references, by application
4.1 & 4.2 Storage and retrieval of video databases & Delivery of pictures
and video for professional media production
Aguierre Smith, T. G., & Davenport, G. (1992). The Stratification System. A Design
Environment for Random Access Video. In ACM workshop on Networking
and Operating System Support for Digital Audio and Video, San Diego,
California
Aguierre Smith, T. G., & Pincever, N. C. (1991). Parsing Movies In Context. In
Proceedings of the Summer 1991 Usenix Conference, (pp. 157-168).
Nashville, Tennessee.
Aigrain, P., & Joly, P. (1994). The automatic real-time analysis of film editing and
transformation effects and its applications. Computer & Graphics, 18(1), 93 103.
American Library Association's ALCTS/LITA/RUSA. Machine-Readable
Bibliographic Information Committee. (1996). The USMARC Formats:
Background and Principles. MARC Standards Office, Library of Congress,
Washington, D.C. November 1996Bateman, J. A., Magnini, B., & Rinaldi, F.
(1994). The Generalized Italian, German, English Upper Model. In
Proceedings of the ECAI94 Workshop: Comparison of Implemented
Ontologies, Amsterdam.
Bloch, G. R. (1986) Elements d'une Machine de Montage Pour l'Audio-Visuel. Ph.D.,
Ecole Nationale Superieure Des Telecommunications.
Bobrow, D. G., & Winograd, T. (1985). An Overview of KRL: A Knowledge
Representation Language. In R. J. Brachman & H. J. Levesque (Eds.),
Readings in Knowledge Representation (pp. 263 - 285). San Mateo,
California: Morgan Kaufmann Publishers.
Butler, S., & Parkes, A. (1996). Film Sequence Generation Strategies for generic
Automatic Intelligent Video Editing. Applied Artificial Intelligence (AAI)
[Ed: Hiroaki Kitano], Vol. 11, No. 4, pp. 367-388.
Butz, A. (1995). BETTY - Ein System zur Planung und Generierung informativer
Animationssequenzen (Document No. DFKI-D-95-02). Deutsches
Forschungszentrum fur Kunstliche Intelligenz GmbH.
Chakravarthy, A., Haase, K. B., & Weitzman, L. (1992). A uniform Memory-based
Representation for Visual Languages. In B. Neumann (Ed.), ECAI 92
Proceedings of the 10th European Conference on Artificial Intelligence, (pp.
769 - 773). Wiley, Chichester: Springer Verlag.
MPEG-7 Applications Document
35
Chakravarthy, A. S. (1994). Toward Semantic Retrieval of Pictures and Video. In C.
Baudin, M. Davis, S. Kedar, & D. M. Russell (Ed.), AAAI-94 Workshop
Program on Indexing and Reuse in Multimedia Systems, (pp. 12 - 18). Seattle,
Washington: AAAI Press.
Davenport, G., Aguierre Smith, T., & Pincever, N. (1991). Cinematic Primitives for
Multimedia. IEEE Computer Graphics & Applications (7), 67-74.
Davenport, G., & Murtaugh, M. (1995). ConText: Towards the Evolving
Documentary. In ACM Multimedia 95 - Electronic Proceedings. San
Francisco, California: November 5-9, 1995.
http://ic.www.media.edu/icPublications/gdlist.html
Davis, M. (1995) Media Streams: Representing Video for Retrieval and Repurposing.
Ph.D., MIT.
Del Bimbo, A., Vicario, E., & Zingoni, D. (1992). A Spatio-Temporal Logic for
Sequence Coding and Retrieval. In IEEE Workshop on Visual Languages,
(pp. 228 - 231). Seattle, Washington: IEEE Computer Society Press.
Del Bimbo, A., Vicario, E., & Zingoni, D. (1993). Sequence Retrieval by Contents
through Spatio Temporal Indexing. In IEEE Symposium on Visual Languages,
(pp. 88 - 92). Bergen, Norway: IEEE Computer Society Press.
Domeshek, E. A., & Gordon, A. S. (1995). Structuring Indexing for Video. In J. Lee
(Ed.), First International Workshop on Intelligence and Multimodality in
Multimedia Interfaces: Research and Applications.. Edinburgh University:
July 13 - 14, 1995.
Gregory, J. R. (1961) Some Psychological Aspects of Motion Picture Montage. Ph.D.
Thesis, University of Illinois.
Haase, K. (1994). FRAMER: A Persistent Portable Representation Library. In ECAI
94 European Conference on Artificial Intelligence, (pp. 732- 736).
Amsterdam, The Netherlands.
Hampapur, A., Jain, R., & Weymouth, T. E. (1995a). Indexing in Video Databases. In
Storage and Retrieval for Image and Video Databases II, (pp. 292 - 306). San
Jose, California, 9 - 10 February 1995: SPIE.
Hampapur, A., Jain, R., & Weymouth, T. E. (1995b). Production Model Based Digital
Video Segmentation. Multimedia Tools and Applications, 1, 9 - 46.
International Standard Z39.50: "Information Retrieval (Z39.50): Application Service
Definition and Protocol Specification". http://lcweb.loc.gov/z3950/agency/
Isenhour, J. P. (1975). The Effects of Context and Order in Film Editing. AV
Communication Review, 23(1), 69 - 80.
MPEG-7 Applications Document
36
Lenat, D. B., & Guha, R. V. (1990). Building Large Knowledge-Based Systems Representation and Inference in the Cyc Project. Reading, MA.: AddisonWesley.
Lenat, D. B., & Guha, R. V. (1994). Strongly Semantic Information Retrieval. In C.
Baudin, M. Davis, S. Kedar, & D. M. Russell (Ed.), AAAI-94 Workshop
Program on Indexing and Reuse in Multimedia Systems, (pp. 58 - 68). Seattle,
Washington: AAAI Press.
Mackay, W. E., & Davenport, G. (1989). Virtual Video Editing in Interactive
Multimedia Applications. Communications of the ACM, 32(7), 802 - 810.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1993).
Introduction to WordNet: An On-line Lexical Database
(ftp://clarity.princeton.edu/pub/wordnet/5papers.ps). Cognitive Science
Laboratory, Princeton University.
Nack, F. (August 1996) AUTEUR: The Application of Video Semantics and Theme
Representation for Automated Film Editing. Ph.D. Thesis, Lancaster
University
Nack, F. and Parkes, A. (1995). AUTEUR: The Creation of Humorous Scenes Using
Automated Video Editing. Proceedings of JCAI-95 Workshop on AI
Entertainment and AI/Alife, pp. 82 - 84, Montreal, Canada, August 19, 1995.
Nack, F. and Parkes, A. (1997) Towards the Automated Editing of Theme-Oriented
Video Sequences. Applied Artificial Intelligence (AAI) [Ed: Hiroaki Kitano],
Vol. 11, No. 4, pp. 331-366.
Nagasaka, A., & Tanaka, Y. (1992). Automatic video indexing and full-search for
video appearance. In E. Knuth & I. M. Wegener (Eds.), Visual Database
Systems (pp. 113 - 127). Amsterdam: Elsevier Science Publishers.
Oomoto, E., & Tanaka, K. (1993). OVID: Design and Implementation of a VideoObject Database System. IEEE Transactions On Knowledge And Data
Engineering, 5(4), 629-643.
Parkes, A. P. (1989a) An Artificial Intelligence Approach to the Conceptual
Description of Videodisc Images. Ph.D. Thesis, Lancaster University.
Parkes, A. P. (1989c). Settings and the Settings Structure: The Description and
Automated Propagation of Networks for Perusing Videodisk Image States. In
N. J. Belkin & C. J. van Rijsbergen (Ed.), SIGIR '89, (pp. 229 - 238).
Cambridge, MA:
Parkes, A. P. (1992). Computer-controlled video for intelligent interactive use: a
description methodology. In A. D. N. Edwards &. S.Holland (Eds.),
Mulimedia Interface Design in Education (pp. 97 - 116). New York: SpringerVerlag.
MPEG-7 Applications Document
37
Parkes, A., Nack, F. and Butler, S. (1994) Artificial intelligence techniques and film
structure knowledge for the representation and manipulation of video.
Proceedings of RIAO '94, Intelligent Multimedia Information Retrieval
Systems and Management, Vol. 2, Rockefeller University, New York, October
11-13, 1994.
Pentland, A., Picard, R., Davenport, G., & Welsh, B. (1993). The BT/MIT Project on
Advanced Tools for Telecommunications: An Overview (Perceptual
Computing Technical Report No. 212). MIT.
Sack, W., & Davis, M. (1994). IDIC: Assembling Video Sequences from Story Plans
and Content Annotations. In IEEE International Conference on Multimedia
Computing and Systems. Boston, Ma: May 14 - 19, 1994.
Sack, W., & Don, A. (1993). Splicer: An Intelligent Video Editor (Unpublished
Working Paper).
Tonomura, Y., Akutsu, A., Taniguchi, Y., & Suzuki, G. (1994). Structured Video
Computing. IEEE MultiMedia, 1(3), 34 - 43.
Ueda, H., Miyatake, T., Sumino, S., & Nagasaka, A. (1993). Automatic Structure
Visualization for Video Editing. In ACM & IFIP INTERCHI '93, (pp. 137 141).
Ueda, H., Miyatake, T., & Yoshizawa, S. (1991). IMPACT: An Interactive NaturalMotion-Picture Dedicated Multimedia Authoring System. In Proc ACM CHI
'91 Conference on Human Factors In Computing Systems, (pp. 343-450).
Yeung, M. M., Yeo, B., Wolf, W. & Liu, B. (1995). Video Browsing using Clustering
and Scene Transitions on Compressed Sequences. In Proceedings IS&T/SPIE
'95 Multimedia Computing and Networking, San Jose. SPIE (2417), 399 - 413.
Zhang, H., Kankanhalli, A., & Smoliar, S. W. (1993). Automatic Partitioning of FullMotion Video. Multimedia Systems, 1, 10 - 28.
We are still seeking references regarding ANSI guidelines for Multi-lingual
Thesaurus and International radio and television typology.
5.1
User agent driven media selection and filtering
Maes, P. (1994b) “Modeling Adaptive Autonomous Agents,” Journal of Artificial
Life, vol. 1, no. 1/2, pp. 135 - 162.
Hanke Fjordhotel, Norway, August 9-12, 1997.Parise, S., S. Kiesler, L. Sproull, & K.
Waters (1996) “My Partner is a Real Dog: Cooperation with Social Agents,”
In proceedings of Conference on Computer Supported Cooperative Work
/CSCW 96, edited by M. S. Ackermann, pp. 399 - 408, Hyatt Regency Hotel,
Cambridge, Mass.: ACM.
Rossetto, L., & O. Morton (1997) “Push!,” Wired, no. 3.03 UK, March 97 , pp. 69 81.
MPEG-7 Applications Document
38
5.2
Intelligent multimedia presentation
Andre, E. (1995). Ein planbasierter Ansatz zur Generierung multimedialer
Pr‰sentationen., Ph.D., Sankt Augustin: INFIX, Dr. Ekkerhard Hundt.
Andre, E., & Rist, T. (1994). Multimedia Presentations: The Support of Passive and
Active Viewing. In AAAI Spring Symposium on Intelligent Multi-Media MultiModal Systems, (pp. 22 - 29). Stanford University: AAAI.
Maybury, M. T. (1991) “Planning Multimedia Explanations using Communicative
Acts,” In proceedings of Ninth National Conference on Artificial Intelligence,
AAAI-91, pp. 61 - 66, Anaheim, CA: AAAI/MIT Press.
Riesbeck, C. K., & Schank, R. C. (1989). Inside case-based reasoning. Hillsdale,
New Jersey: Lawrence Erlbaum Associates.
6.3
Universal Access
Skodras A.N., and Christopoulos C., "Down - sampling of compressed images in the
DCT domain", European Signal Processing Conference (EUSIPCO), 1998 .
J. R. Smith, R. Mohan, and C.-S. Li, “Transcoding Internet Content for
Heterogeneous Client Devices”, IEEE Intern. Conf. Circuits and Systems,
June, 1998.
J. R. Smith, R. Mohan, and C.-S. Li, “Content-based Transcoding of Images in the
Internet”, IEEE Intern. Conf. Image Processing, Oct., 1998.
C.-S. Li, R. Mohan, and J. R. Smith, “Multimedia content description in the
InfoPyramid”, IEEE Intern. Conf. Acoustics, Speech and Signal Processing,
May, 1998.
Björk N. and Christopoulos C., "Transcoder Architectures for video coding", IEEE
International Conference on Acoustic Speech and Signal Processing (ICASSP
98), Seattle, Washington, Vol. 5, pp. 2813-2816, May 12-15, 1998.
Björk N. and Christopoulos C., "Transcoder Architectures for video coding", IEEE
Transactions on Consumer Electronics, Vo. 44, No. 1, pp. 88-98, February
1998.
S. Paek and J. R. Smith, “Detecting Image Purpose in World-Wide Web Documents”,
SPIE/IS&T Photonics West, Document Recognition, January, 1998.
R. Mohan, J. R. Smith and C-S. Li. Multimedia Content Customization for Universal
Access, Multimedia Storage and Archiving Systems, SPIE Vol 3527, Boston,
November 1998.
R. Mohan, J. R. Smith and C-S. Li. Adapting Multimedia Internet Content for
Universal Access, IEEE Transactions on Multimedia, Vol. 1, No. 1, March
1999.
MPEG-7 Applications Document
39
R. Mohan, J. R. Smith and C-S. Li. Adapting Content to Client Resources in the
Internet, IEEE Intl. Conf on Multimedia Comp. and Systems ICMCS99,
Florence, June 1999.
R. Mohan, J.R. Smith and C-S. Li, Content Adaptation Framework: Bringing the
Internet to Information Appliances, Globecom 99, Dec 1999.
6.5
Educational Applications
Tagg, Philip (1981). On the Specificity of Musical Communication. Guidelines for
Non-Musicologists.SPGUMD 8115 (23 pp.)
Tagg, Philip (1984). Understanding 'Time Sense': concepts, sketches, consequences.
Tvorspel: 21-43. [Forthcoming on-line, see Tagg 1987].
Tagg, Philip (1990). Music in Mass Media Studies. Reading Sounds for Example.
PMR 1990: 103-114
MPEG-7 Applications Document
40
Annex B: An example architecture for MPEG-7 Pull applications
A search engine could freely access any complete or partial description associated
with any AV object in any set of data, perform a ranking and retrieve the data for
display by using the link information. An example architecture is illustrated in fig.1.
Figure 2: Example of a client-server architecture in a MPEG-7 based data search.
MPEG-7 Applications Document
41
Download