Contextual Ontology based Multimedia metadata

advertisement
Contextual Ontology based Multimedia metadata application system
describing framework
1. Introduction
As Mark Weiser pointed out[1], computing is becoming pervasive. In the
pervasive computing domain, multimedia data is widely used and plays a more and
more important role in many systems especially in the surveillance domain.
However, because of the huge semantic gap between the multimedia data and the
understanding, when intelligent surveillance system is facing the multimedia data
which is captured by various sensors, the system cannot get the correct context and
understand user’s intention so as to can not provide automatic and more correct
services. As a result, system often needs the help of human. It is widely accepted
that the gap between data and semantic meaning limits the development of
surveillance system. We think that the metadata describing schema play a key role
to bridge the gap in surveillance system. However, isolating from the system’s
information and architecture, current multimedia metadata describing schema
limits its use. In addition, it’s difficult to make automatic analysis to the media
content, or there is no effect expression of the result of feature extracting according
to current metadata schema. The lack of precise models and formats for object and
system representation and the high complexity of multimedia processing
algorithms make the development of fully automatic semantic multimedia analysis
and management systems a challenging task.[2]
Many research groups are active in finding and proposing interesting solutions
or standards about the management and exchange of multimedia data. Within the
MUSCLE NoE research is focusing on standards, technologies and techniques for
integrating, exchanging and enhancing the use of multimedia within a variety of
research areas. At CNR ISTI, Patrizia Asirelli et al. are developing an infrastructure
for MultiMedia Metadata Management (4M) to support the integration of media
from different sources. This infrastructure enables the collection, analysis and
integration of media for semantic annotation, search and retrieval.[3] Jeffrey E.
Boyd et al. embed the low-level functions performed by a video surveillance
system into cameras. They build video information servers that are conceptually
similar to MPEG-7 cameras, but differ in that they interact with client applications
and can be configured dynamically to do more than describe video content.[4]
Trivedi et al. presents an overall system architecture to support design and
development of intelligent environments. Their system supports capturing data by
omni-canera and PTZ Camera. Various sensors data are encoded into XML format
and stored into Knowledge Base through the Server Core of the system. There are
various modules for multicamera-based multiperson tracking, event detection and
event based servoing for selective attention, voxelization, streaming face
recognition in their system.[5]
In this paper we mainly discuss a multimedia metadata application system
describing framework based on ontology. In the framework, we mainly resolve the
problem in representation and communication of data, metadata and other
information. Through defining ontology, we can not only implement the exchange
and interaction among modules of a system even among different systems, but also
support the context-aware service through defining the dynamic context model.
2. Related works about visual data describing research
Because XML has well readability, commonality and extensibility, it is widely
accepted that using the language based on XML to describe multimedia data is a
better approach, e.g. MPEG-7, CVML, and VERL & VEML and so on.
Thor List and Robert B. Fisher proposed a XML-based Computer Vision
Markup Language (CVML) for use in Cognitive Vision, to enable separate
research groups to collaborate with each other as well as making their research
results more available to other areas of science and industry. [6] CVML mainly
emphasizes the low-level feature, but lack of supporting to high-level semantic.
In the “Challenge Project on Video Event Taxonomy” sponsored by the
Advanced Research and Development Activity (ARDA) of the U.S. , more than 30
researchers in computer vision and knowledge representation and representatives
of the user community proposed the VERL formal language which is used to
describe event taxonomy. Meantime, they also proposed the VEML (Video Event
Markup Language) which is used to annotate the event instance of VERL. VERL
& VEML has a strong concept of object-orient, and a high level abstract, but lack
of the low-level feature expression.(Fig 1.) [7]
Fig1 Diagram of the relationship between VERL and VEML
MPEG-7 is Multimedia Content Description Interface[8], which was proposed by
MPEG organization in 1996, and became an international standard in 2001. It can
describe both low-level feature and high-level semantics. However, it has to be
noted that though its function is strong, the semantics of its elements have no
formal grounding. So the resulting interoperability problems prevent an effective
use of MPEG-7 as a language for describing multimedia.[9]
Fig2 Use scope of MPEG-7
3. Requirements of designing multimedia metadata describing schema
An original purpose of designing multimedia metadata schema is to annotate
multimedia data so as to search or retrieve certain content of such multimedia data.
As we are entering the pervasive computing era, now the multimedia metadata
should play a more fundamental and important role. Recognizing that metadata
will be the important hinge among all modules in the system, we think the
metadata describing schema should base on the information system. System’s
different modules interact information each other by metadata. For example, the
storage module indexes and stores the output and mediate results of other modules
as metadata type. The context information will be got based on metadata too. So
we can get some requirements according to multimedia metadata as follows:
Extensibility. In multimedia application, metadata expression will not be static.
For example, the application system might add a new module or modify original
modules, and this requires the metadata have a better extensibility to adapt the
system’s changing. In addition, the objects described by metadata may change or
add. Currently the media type mainly include video, audio etc, but as the
technology evolve, new media may occur. The multimedia metadata schema
should not redesign according to every new media.
Interoperability (readability). When designing the metadata describing
schema, we should considerate the information fusion issue including among
different modules and different systems. Multimedia application system consists of
various modules. Many tasks often need the cooperation of different modules, So
this requires a common understanding about certain subject. Here interoperability
means that a processing unit can understand the result of other units and adapt
itself. In addition, different multimedia application system often has different
format of information, and this prevent fusion of different systems. Metadata
having better readability can ease the data maintain and the transform between
different format. At the meantime, the interoperability can be benefit for developer
to maintain system.
Easy to parse and easy to transport by Network. Multimedia application
systems ongoing researches are inclining to multi sensors, heterogeneous and
distributed architecture. Different computing units have to share information and
communicate with each other via network. The distributed system demands that
the metadata describing various information be easy to transport and easy to be
parsed so as to interact information, improve the speed and efficiency of
information processing.
Ability to describe and reason context information. Because the same action
may convey different meaning in different context, multimedia application system
needs context information to detect and understand event. Enabling system
intelligent should be a start point and target of designing multimedia metadata
schema, or else the metadata will limit its usage. The design of metadata should
support context information expression, ranging from the low-level feature to
high-level semantic description. In addition, the metadata should support more
complicated information expression which combines different information.
Easy to check error and trace. Because capturing the low-level feature has
some uncertainty, sometimes has errors which may be found later, this require error
check mechanism. If system has no such mechanism, it will cause error decision
effecting the later processing and lead to error results. With such mechanism,
system should record the state of error check, but should not slowdown the speed
of the updated metadata processing.
4. The design of metadata describing framework based on ontology
When designing multimedia metadata describing schema, we should not be
limited to construct only media metadata. We propose a framework taking account
of system processing platform. In our framework, we not only define media data,
but also other information including context and system settings etc. As a whole,
we call the describing schema MeSysONT (Multimedia System Ontology).
Because our framework is based on system’s architecture, this paper will briefly
introduce the Software Platform for Distributed Visual Information Processing
proposed by us in 2007. [10]
4.1 Brief introduction on the Software Platform for Distributed Visual
Information Processing.
Distributed Visual Information Processing system (DiVI) is a kind of
distributed intelligent system which employs distributed camera arrays. In the
platform multi-level system organization and multi-server platform architecture is
adopted. This division of the common services and application simplifies the
application's development and deployment as well as the whole system's
flexibility.(Fig3.)
Fig3. An overview of system architecture
With such architecture, the division of the common services and application
simplifies the application's development and deployment as well as the whole
system's flexibility. A new processing unit can be added into the platform
conveniently without more interference to system. For the transparence of
communication between platform and applications, a set of XML read-write
classes are also designed. These classes provide message format and encapsulate
the serialization and un-serialization of the messages. This usage of XML also
gives the platform agility for future demands. [10]
4.2 Why using ontology?
New computing technology will play more and more important role in
assisting people’s life. However, the obstacle of developing application system is
the huge gap between data and understanding. Furthermore, system can’t get the
meaning of human only by capturing his actions. System has to know scenario and
the dynamic context just like person have common sense knowledge. Taking
account of this we introduce ontology concept into our describing framework.
The term “ontology” was borrowed from philosophy and was introduced into
the knowledge engineering field as a means of abstracting and representing
knowledge. Ontologies are used to build consensual terminologies for the domain
knowledge in a formal way so that they can be more easily shared and reused.
More recently, ontologies have been applied in many fields of computer science,
such as the semantic Web, e-commerce, and information systems[11]. Moreover,
ontologies have been used in context-aware systems in pervasive computing
domain, e.g. CoBrA, Gaia, SOCAM and so on.
Clarify organization using Ontology in domain makes correction easy and
convenient. Ontology can supply the information that detect general context or
user wants more exactly. Especially, intelligent agents need Ontology which is well
defined to understand situation information and run reasoning about (context
understanding) vague situation. Moreover, there is immediately benefit that can get
through designing context base. That is, explain about context that need, and do
process itself through process which compose context depending on it and set in
organization modeling.
Furthermore the Ontology can help computer vision algorithm get a better
result and make it possible to apply it to assist people’s life. For example, there is a
big problem that the environment often change when applying CV algorithm. If the
system has ontology about physical environment and can reason about dynamic
context, it will help algorithm adapt its parameters so as to solve its robustness and
suitability problem. We can also define sensors information ontology to enable
system select suitable sensor or combine them. We think this can solve the problem
of occlusion.
4.3 ontology based describing schema
According to the multimedia application in pervasive computing, we propose
a framework to describe information including multimedia metadata based on
ontology. Its basic idea is that the multimedia metadata is based on the ontology
which also describes the system’s information including architecture, components
and so on.
Strengthening the ability of semantic expression is the other main idea. The
ontology describes information ranging from raw data to system’s overall structure.
Additionally we introduce the description of context information and reasoning
mechanism so as to support context aware service.
Fig4 shows the overview of the ontology framework. In the framework, the
ontology includes various information. We divide the ontology into three types
based on their functions, which is Media data related, information system related
and context information ontology. Additionally we define task ontology of system
so as to describe system’s function and provide services conveniently.
In our opinion, defining the application system’s overall structure and the
understanding to multimedia data as ontology will get some benefits as followings:
Making it easy to understand multimedia data.
It will be convenient to analyze and reason on the data
It will be useful to fuse different modules and systems
It will help developers to understand and evolve the system.
Media data related ontology includes two levels: low and high level. The
low-level mainly is used to express the result of feature extracting, having no
semantic meaning. When designing the low-level media metadata ontology, we
should considerate how to express the change of its fundamental element; this will
be useful to help the algorithm of feature extracting. The high-level description has
simple semantic meaning, and can provide basic material for context reasoning.
Fig4. Overview of contextual ontology framework
Information related ontology mainly describes the application system’s
structure and its functional modules. Such designing will enable different modules
to cooperate each other more conveniently, which can help system to implement
distributed computing.
Context information related ontology includes static and dynamic ontology.
The static context mainly describe stable or less changeable information, e.g.
physical environmental information, network parameters, sensors parameters
information and so on. The dynamic context mainly describes the result when
entity’s (including human user) state information is changing in certain static
context. With the task ontology, system will know what service should be provided
according to certain context.
It has to be noted that the ontology is not layered structured in the real
situation. Components of the ontology have various relationships between each
other. Fig5 shows an example of a partial contextual ontology. We can see that
many components relate to each other through the Location ontology.
Fig5. An example of home scenario ontology
5. Overview of the information processing architecture in framework
A target of the framework is to enable system to provide context-aware
service. With such service, the system not only get current context to reason user’s
attention, but also can guide the low-level feature extracting algorithm. Under our
framework, different processing units ‘know’ each other better, so they will
cooperate efficiently and conveniently.
Fig6. An overview of Information processing architecture
Fig6 shows the brief overview of information processing architecture. Data
capturing modules capture raw data from various sensors and extract them to the
data processing modules. The data processing modules generate metadata
according to the framework. Analyze & reasoning module can get the current
context using Dynamic Context model
[12]
. Combining with the task defined by
ontology, system can understand user’s attention and provide active services.
Additionally, current context can guide the low-level feature extracting algorithm.
6. An example of implementation in the domain of intelligence
surveillance application
Based on the framework, we have implemented a prototype system named
MPEG-7 Based Video Surveillance Information System. [13] The system is used in a
hall scenario. The system can archive multimedia raw data, metadata and retrieve
data according to content. One of its functions is that it can warn the person who is
entering into a special space which can be set by system manager.
Fig7 architecture of the Software platform
The system has a distributed structure. From Fig7 we can see that the whole
system consists of two layers. The Host Server is responsible for the Network
processing. The Application Modules do not need to know how the other
Application Modules are deployed in the network. It is only required to connect to
the Host Server, and this can enhance the flexibility of system. Encoding and
decoding of the multimedia data is accomplished in the Host Server. As one
computer only has one Host Server, system only needs to compress or decompress
the same multimedia data one time, and this can decrease communication resource
and computing time.
Fig8 is a diagram of describing schema in a video surveillance scenario.
Context describes Manager Information, System settings, Camera parameters and
so on. LLevel describe the Blob information captured by motion detection module.
HLevel describe the high level semantic information, including the relationships
Fig8 Diagram of describing schema
among motion blobs, type of motion object. The relation between LLevel and
HLevel is expressed by L2HGraph and H2LGraph.
Following is an example of system settings in context information.
<Host id=”Host_1” ip="166.111.139.121">
--Host locatoin
<Port>
<VideoListen>5001</VideoListen>
-- video data monitoring
<MessageSendListen>6000</MessageSendListen> -- message sending
<MessageRevListen>6006</MessageRevListen>
-- message receiving
<RemoteListen>7000</RemoteListen>
-- monitor connection of other host
</Port>
<Modules>
— — modules of the host
<Module id="Module_1">
<Name>MotionTracking_1</Name> --Module name MotionTracking_1
<Group>TrackingGroup</Group> --belongs to TrackingGroup
<Property>Vision Process</Property> --module’s property
</Module>
</Modules>
<Connections>
-- designated connection used for transmitting message
<Sock>166.111.250.105</Sock> --connect to host 166.111.250.105
</Connections>
</Host>
Following is an example which shows the relation between low-level feature and
high-level semantic meaning.
<Relation xsi:type="SegmentSemanticBaseRelationType"
name="hasMediaPerceptionOf" source="#Body_1" target="#Blob_1"/>
<Relation xsi:type="SegmentSemanticBaseRelationType"
name="hasMediaPerceptionOf" source="#Body_1" target="#Blob_3"/>
<Relation xsi:type="SegmentSemanticBaseRelationType"
name="hasMediaPerceptionOf" source="#Body_1" target="#Blob_4"/>
The above example denotes the high-level semantic entity Body_1 consists of
three blobs Blob_1, Blob_2 and Blob_3 in video picture. This approach has a better
ability to check error. For example, when we find that Body_2 and Body_5 are the
same entity after some processing, we can add a description as following shows.
<Relation xsi:type="SemanticBaseSemanticBaseRelation" name="equivalentTo"
source="#Body_2" target="#Body_5"/>
7. Conclusions
A framework is proposed to support not only multimedia metadata but also the
context-aware service. Multimedia metadata is integrated into the system’s
description so as to understand the data more deeply. Ontology using enables the
modules fusion and context reason. The framework based on ontology supports the
high-level abstraction of metadata and contextual information with the power of a
formal describing schema which allows context inference to provide more precise
context information adapted to changing, heterogeneous smart space environments.
Further research will investigate probability description logic approaches with
more inference power to make the system more robust and extensible.
References:
[1] Mark Weiser, The Computer for the 21st Century, Mobile Computing and
Communications Review, Volum3, Number 3
[2] Dasiopoulou, S., Papastathis, V.K., Mezaris, V., Kompatsiaris, I., Strintzis,
M.G.: An ontology framework for knowledge-assisted semantic video analysis and
annotation. In: Proceedings of the 4th International Workshop on Knowledge
Markup and Semantic Annotation (SemAnnot 2004) at the 3rd International
Semantic Web Conference (ISWC 2004) (2004)
[3] Asirelli, P., Little, S., Martinelli, M., Salvetti, O.: MultiMedia Metadata
Management: a Proposal for an Infrastructure. In: SWAP 2006, Semantic Web
Technologies and Applications, December 18-20, Pisa, Italy (2006)
[4] Boyd, Jeffrey E; Sayles, Maxwell; Olsen, Luke; Tarjan, Paul Source;
Content description servers for networked video surveillance. International
Conference on Information Technology: Coding Computing, ITCC 2004, 2004, p
798-803
[5] Trivedi, M.M.; Huang, K.S.; Mikic, I.; Dynamic context capture and
distributed video arrays for intelligent spaces. Systems, Man and Cybernetics, Part
A, IEEE Transactions on Volume 35, Issue 1, Jan. 2005 Page(s):145 – 163
[6] Thor List and Robert B. Fisher, “CVML – An XML-based Computer
Vision Markup Language”, Proc. Int. Conference. on Pattern Recognition.,
Cambridge. 1. 789-792. 2004
[7] Ram Nevatia, Jerry Hobbs and Bob Bolles, “An Ontology for Video Event
Representation”, 2004 International Conference on Computer Vision and Pattern
Recognition Workshop (CVPRW'04) Volume 7
[8] J.M.Martinez. MPEG-7 Overview (version
JTC1/SC29/WG11 N6828. Palma de Mallorca, October 2004
10).
ISO/IEC
[9] Rapha¨el Troncy,Werner Bailer, Michael Hausenblas, Philip Hofmair, and
Rudolf Schlatte. Enabling Multimedia Metadata Interoperability by Defining
Formal Semantics of MPEG-7 Profiles. In Y. Avrithis et al. (Eds.): SAMT 2006,
LNCS 4306, pp. 41–55, 2006.
[10] Yao Wang, Linmi Tao, Qiang Liu, Yanjun Zhao, Guangyou Xu. A flexible
multi-server platform for distributed video information processing. In The 5th
International Conference on Computer Vision Systems, 2007
[11] JUAN YE, LORCAN COYLE, SIMON DOBSON and PADDY NIXON .
Ontology-based models in pervasive computing systems. In The Knowledge
Engineering Review, Vol. 22:4, 315–347, 2007.
[12] Peng Dai, Guangyou Xu. Event Driven Dynamic Context Model for
Group Interaction Analysis. In Proc. International Conference on Soft Computing
and Human Sciences, Kitakyushu, Japan, 2-5 Aug.,2007 (SCHS'07)
[13] 赵彦钧, 基于 MPEG-7 的视频监控信息系统. 硕士论文,清华大学,
北京,2007
Download