Natural Language Indexing of Multimedia Objects in the ... WWW Distance Learning Environment

advertisement
From: AAAI Technical Report SS-97-02. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Natural LanguageIndexing of MultimediaObjects in the Context of a
WWW
Distance Learning Environment
Gian Piero Zarri
CentreNationalde la RechercheScientifique
54, boulevardRaspail
75270Paris Cedex06, France
zarri@cam
s.msh-paris.fr
Abstract
Allthe specialistsare agreedthat the possibilityof adding
to
multimedia
objects somesort of "conceptual"annotation
describingtheir information
contentundera formsuitable
for computer
processingwouldcontributegreatly to solve
the problem
of their "intelligent"indexing
andretrieval.The
existingprojectsin this domain,
like Information
Manifold,
UNTANGLE
or MIHMA,propose to build up the
conceptualannotation (or "generalised bookmark")
makinguse of very complexconceptuallanguages,like
somesort of descriptionlogics. Themajordrawback
of this
approach
concernsthe difficultyof associating"byhand"a
(syntacticallyverycomplex)
generalisedbookmark
in the
descriptionlogics style to WWW
objects pertainingto a
multimedia
universesufficiently large. Toalleviate this
problem,wepropose-- in the contextof a newlystarted
project whichconcernsthe establishmentof detailed
blueprints(andof a prototype)of a full object-oriented,
WWW
DistanceLearningEnvironment
-- to associatewith
the WWW
objectsnot the final conceptual
annotation,but a
simplenatural language(NL)captionin the formof short
texts representinga general, neutral descriptionof the
informational
contentof the object.Toaccelerate
this sort of
operation,it couldalsobe possibleto make
useof dictation
systemslike DragonDictate or IBMVoiceType.TheNL
caption should then be converted into a conceptual
annotationin NKRL
(NarrativeKnowledge
Representation
Language),
making
use of an automatic"translation"system
like thosewehaverecentlyimplemented
in the contextof
projectspartially financedbythe European
Union.
The general
framework of WebLearning
The architecture of the WebLearningenvironment is
structuredaroundthree mainbuildingblocks.
The central kernel of WebLearningconsists of a
multimedia object-oriented data managementsystem,
characterised by the notions of objects, object
encapsulation,classes, and inheritance(Bertino &Martino
1993). Its basic functions concern a) storing and
retrieving the multimediaelements to be used for the
assemblingof the learning material, and c) routing this
material to the users. The multimedia elements are
essentially texts, images,video, audioand formatteddata.
Tofacilitate their retrieval, they will be annotatedusing
"conceptualannotations",see the next Section. Thekernel
must also be able to managecompound
multimediaobjects,
i.e., structured sets of elementspertaining to different
media. Compound
objects maybe "static" or "dynamic".
Static compounds
are created by linking together they
constituent elementswith predeterminedconceptuallinks,
like those used in the hypertextual systems. Dynamic
compoundsare created whennecessary, and they are
structured accordingto a temporaldimension.Thekernel
concernsalso the management
of a repository of distance
learning material available on the server (LR, Learning
Repository), realised as a collection of WWW
pages
allowinga free accessto this material.
Thesecondbuilding block consists in a sub-environment
specifically dedicatedto the "authoring"activities which
will allow the creation of the learning material -- i.e.,
substantially, the creation, organisationandfeedingof the
Learning Repository. Three main componentsmust be
realised here. The first is a specialised WWW
browser
allowingthe retrieval on Internet of the informationand
data to be usedfor the assemblingof the distance learning
material-- in a first phase,it canbe simplya specialisation
of the existing WWW
search engines, like Lycos and
Yahoo. The second component will consist in the
specialisededitor allowingthe definition of the static and
dynamiccompound
objects. Thethird will consist in a set
of tools to allow the productionof the final, conceptual
annotation of the chosen (simple and compound)
multimediaelements,see again the next Section.
Thelast building block is a sub-environment
supplying
the student with the tools whichare neededto execute an
"intelligent" retrieval of the learningmaterialandto allow
its optimalutilisation. In particular, the use of this material
must be personalised and, possibly, assisted throughthe
recourse to exercises or tests. In this context, the main
problemis, obviously,that of the correct retrieval of the
annotatedmaterial ; a morespecific problemconcernsthe
developmentof efficient tools to managea cache memory
devoted to the storage of multimediadata and compound
objects, and whichmustbe inserted in the client-node to
whichthe user will be connected.
155
The annotation
problem
In the context of anyproject concerningthe retrieval of
WWW
objects, a central problemconcernshowto built up
a powerfuland flexible indexingframework.
In the WebLearning
project, wehave chosento makeuse
of somesort of generalised "access by content" -- in
oppositionto the physically-basedtechniquesthat consist
in retrieving data on the basis of some "external"
characteristics of the supports,suchas, for the imagesand
the videos, colour, shape, texture, motionpatterns, scene
breaks, pauses in the audio, camerapans and zooms,etc.
As well known,an "authentic" access by content is
particularly difficult to envisagefor non-textualmaterial
(images, video, audio). For example, some very
encouragingexperimentsexist that allow to automatically
analyse, to a certain extent, the external characteristics
mentionedbefore, see, e.g., (MacNeil1991), (Mills,
Cohen, & Wong1992), (Zhang, Kankanhalli, & Smoliar
1993). However,
for the time being, this informationalone
is not sufficient to allow the creation of a detailed
representation of the semantic content of non-textual
documents,that could be able to support their contentbasedretrieval androuting.
In the past, indexing(and therefore access) by content
has beenoften realised by using keywords.Thelimitations
of this approach are well known.They range from the
discrepancies in the choice of the keywordbetweenthose
whobuild up the indexesandthe real users, to the fact that
keywordscannot describe the complextemporal structure
of video andaudio information,to the problems,finally,
concerningorder, like the impossibilityof distinguishing,
in a pure keywordapproach, betweenthe two situations,
concerningthe first "an old manbiting a dog" and the
second"an old dog biting a man".Consequently,several
researchers have recently proposedto makeuse of some
sort of structured, "conceptual"annotationdescribingin
somedepth the information content of the WWW
objects
to retrieve, see projects like UNTANGLE
(Welty 1994),
MIHMA
(Hoppeet al. 1996) or Information Manifold,
(Kirk 1996). Thesethree projects realise the conceptual
annotations (or "generalised bookmarks")makinguse
description logics languages,see (Brachman
et al. 1991).
Such generalised bookmarksare then inserted in a
knowledgebase that the user can consult by employing
advancedretrieval and inference techniques-- instead of
consultinga simple,unstructuredlist of pure bookmarks.
In our opinion such approach has, however, a major
drawback.Evenif a solution in the UNTANGLE’s
style is
adopted-- i.e., that of searchingfirst for interesting WWW
pagesandother on-line informationusingthe "traditional"
tools (search enginesetc.) and then, whensuch information
is discovered, producing the correspondingconceptual
annotations-- there is still the difficulty of creating "by
hand"a (syntactically very complex)generalised bookmark
in the description logic style for WWW
objects pertaining
to a multimedia
universesufficiently large. Toalleviate this
problem, wepropose to explore in WebLeaming
a Natural
Language(NL) approach that we consider as sufficient
general, and not suitable only for our distance learning
context. Wesuggest then to associate with the original
WWN¢
objects or with those selected after retrieval through
the searchengines,not the final conceptualannotation,but
a simpleNLcaptionin the formof a short text representing
a general, neutral description of the content of the
document.
Toacceleratethis sort of operation,it couldalso
be possible to makeuse of dictation systemslike Dragon
Dictate or IBMVoiceType.Beforewe definitely link the
objects with (in our case, see the previous Section) the
LearningRepository,the captions shouldbe convertedinto
a conceptualannotations: weproposeto represent themin
NKRL
(Narrative KnowledgeRepresentation Language)
terms, thanksto the automatic"translation" tools fromNL
to NKRL
that have been recently implemented in the
context of two Europeanprojects : NOMOS,
Esprit P5330,
and COBALT,
LREP61011.
NKRL
has beenextensively described in the literature,
see, e.g., (Zarri 1992),(Zarri 1994),(Zarri 1995).It
high-level knowledgerepresentation language endowed
with particular features whichmakeit well suitable for
representing descriptive meanings.Wecan mentionhere
the presence,besides the traditional "hierarchy(ontology)
of concepts", of an "ontologyof events" (a hierarchical
organisation of templates representing the formal
description of general classes of real-worldevents, like
"movinga physical object" or "havinga particular attitude
towardssomeone"),or the possibility of describing the
temporalcharacteristics of the specific events (these last
are represented as instances of the NKRL
templates).
Moreover,NKRL
allows for the representation of implicit
andexplicit enunciativesituations, of wishes,desires, and
intentions,of plural situations, of causality, andof complex
secondorder, intertwinedconstructions.In additionto the
tools allowingfor the translation of original NLdocuments
into their proper NKRLrepresentation,
the NKRL
technologyincludes also a set of tools for executingan
"intelligent" exploitationof the conceptualrepresentation.
Theseallow, inter alia, the retrieval -- by the automatic,
semantictransformationof an user querythat failed -- of
information which is conceptually (but not formally)
equivalentto the informationoriginally searchedfor.
Togive only a simple example,Figure 1 reproducesthe
NKRL
representation of a (fragment of) NLcaption like
"Three girls are lying on the beach" that could be
associated with a WWW
image.
156
cl)
EXIST SUBJ
(SPECIF girl_l (SPECIF
card inal ity_ 3)) (beach_)
MODAL lying__position
[ girl_l
InstanceOf : girl_
HasMember
: 3]
Figure I - NKRL
representation of an NLcaption.
Thecaption is represented according to the rules for
encoding"plural situations" in NKRL,
see, e.g., (Zarri
1995). Therefore,an "occurrence"el, instance of a basic
NKRL
template, brings along the maincharacteristics of
the event to be represented. The non-emptyHasMember
slot in the "individual" girl_l, instance of an NKRL
"concept"(girl_), makesit clear in turn that this individual
is referring to several instancesof girl_. Please note that
the italic type style is systematicallyusedto represent a
concept_,while the romanstyle represents an individuaL.
A "location attribute" (a list) is associated with the
argument(role filler) of the SUBJ(ect)role in cl
using the colon code, ":". lying__position(the predicate
argumentintroduced by MODAL(ity))
and beach are both
representedas generic conceptsgiving that no details are
given about their possible, peculiar characteristics. The
"attributive operator", SPECIF(ication),whichappears
c l, is one of the NKRL
operators used to build up
"structured arguments" (expansions) of the NKRL
conceptual predicates (EXISTin this case) ; SPECIF
used to represent someof the properties whichcan be
assertedaboutthe first elementof an expansion
list.
using someformof "controlled natural language"in order
to inducean importantsimplification of the "translation"
operations, see (Proceedings1996), etc,
A further subject of investigation -- which, in the
WebLearningproject, represents only a subsidiary
direction of research -- consists in the idea of coupling
NKRL,
for conceptualannotation purposes, with an iconic
visual languagein the MediaStreamstyle (Davis 1993).
This should allow, for the non-textual documents,the use
of conceptual descriptions of these documents(i.e.,
temporally-stamped NKRLtemplates) where the
"ordinary" NKRL
concepts will be mixed with the Media
Stream iconic primitives. These last are groupedinto
general classes like "characteractions", "object actions",
"characters", "time", "space", "cinematography",
"recording medium","transitions" : someof these Media
Streamclasses .already exist in the NKRL
hierarchical
classification of concepts, and the integration of the
residual ones in the NKRL
concept hierarchy should not
constitutea too difficult endeavour.
References
Basili, R., and Pazienza, M.T. 1996. MIDDLEMAN
Inc.: Linguistic-based
IR Tools for W3Users.
In Proceedings
of the WWW5
Workshop on
Wecan makehere two additional remarks.
Artificial
Intelligence-based
Tools to Help
¯ The first is the acknowledgementthat the idea of
W3 Users
(Paris,
INRIA,
May 1996,
indexingand exploringa base of documents
by associating,
"http://www.info.unicaen.fr/-serge/3wia/workshopf’).
in a first step, someNLcomments
to these documentsis
Bertino, E., and Martino, L. 1993. Object-Oriented
certainly not a newone -- see e.g., in a specific WWW
DatabaseSystems- Conceptsand Architectures. Reading,
context, the recent (Basili &Pazienza1996). However,
the
Mass.: Addison-Wesley.
bulk of the existing experimentsin this direction seemto
Brachman,R.J., McGuinness,D.L., Patel-Schneider,
fluctuate between very simple, low-level rule-based
P.F.,
Resnick, L.A., and Borgida, A. 1991. Living with
techniques makinguse of elementarysemantic categories
CLASSIC: When and How to Use a KL-ONE-Like
like those included in WordNet
(Miller 1990) (a_kind_of,
substance_of,
part_of
andmember_of
fornouns,
entails Language.In Principles of SemanticNetworks,Sowa,J.F.,
ed. San Mateo,Calif.: MorganKaufmann.
andcauses_
forverbs,
satellite
of andpertains_to
for
adjectives),
see(Chakravarthy
1994),
andtheinference- Chakravarthy,A.S. 1994. TowardsSemanticRetrieval of
intensive
applications
oftheextremely
complex
machinery Pictures and Video. In Proceedings of the AAAI-94
of CYCproposed
by Lenat
andhiscolleagues,
see(Guha Workshopon Indexing and Reusein MultimediaSystems.
MenloPark, Calif: AAAIPress.
& Lenat
1994).
TheNKRLapproach,
seeagain
theNKRL
literature,
seemtorepresent
anacceptable
compromise Davis, M. 1993. Media Stream: An Iconic Visual
Languagefor Video Annotation. In Proceedingsof 1993
between
these
twoextreme
positions.
IEEESymposiumon Visual Languages, Los Alamitos,
¯ Thesecond
consists
in noticing
thatan obvious Calif.: IEEEComputer
Society Press.
drawback
of anyNL approach
to automatic
indexing
Guha,R.V., and Lenat, D.B. 1994. EnablingAgentsto
concerns
thetheoretical
impossibility,
given
theactual WorkTogether, Communicationsof the ACM37(7): 127"state
oftheart"oftheNLPtechnology,
ofsystematically
142.
obtaining
the"correct",
automatic
(conceptual)
translation Hoppe,T., Kindermann,
C., Paulus, K.O., Tolksdorf,R.,
ofanNLtext,
even
ofa very
short
one.
This
situation
can,
Buu,E., Heimann,S., Schmiedel,A., and Voile, P. 1996.
however,
bepragmatically
improved
thanks
totheuse,ina
The MIHMA
Project: A WebInformation Service Based
separate
or co-operative
way,of several
well-known on Description Logics. In Proceedings of the WWW5
techniques
: allowing
theinclusion,
in theknowledge Workshopon Artificial Intelligence-based
Tools
repository,
of conceptual
description
of WWWobjects to Help W3 Users (Paris,
INRIA, May 1996,
which
canbeincomplete
andeven(partially)
erroneous, "http://www.info.unicaen.fr/-serge/3wia/workshopr).
making
useoftechniques
ofcomputer-assisted
translation,
Conclusion
157
Kirk, T. 1996. Knowledge
BasedAccessto Information
on the World WideWeb. In Proceedings of the WWW5
Workshopon Artificial
Intelligence-based Tools
to Help W3 Users (Paris,
INRIA, May 1996,
"http://www,info.unicaen.fr/~serge/3wia/workshop/").
MacNeil,R. 1991. GeneratingMultimediaPresentations
Automatically using TYRO
: The Constraint, Case-Based
Designer’s Apprentice. In Proceedings of 1991 IEEE
Workshopon Visual Languages. Los Alamitos, Calif.:
IEEEComputerSociety Press.
Mills, M., Cohen,J., and Wong,Y.Y.1992. AMagnifier
Tool for Video Data. In Proceedings of CHI’92. New
York: ACM
Press.
Miller, G.A. 1990. WordNet: An On-line Lexical
Database,InternationalJournalof Lexicography
3(4).
Proceedings of the First Workshopon Controlled
LanguageApplications -- CLAW’96.
Leuven: Centre for
ComputationalLinguistics of the KatholiekeUniversiteit
Leuven.
Welty, C. 1994. Knowledge Representation for
Intelligent InformationRetrieval. In Proceedingsof the
CAIA-94Workshopon Intelligent Access to Digital
Libraries (March1994).
Zhang, H., Kankanhalli, A., and Smoliar, S.W. 1993.
AutomaticPartitioning of Full-MotionVideo, Multimedia
Systems,1:10-28.
Zarri, G.P. 1992. The "Descriptive" Componentof a
Hybrid Knowledge
Representation Language.In Semantic
Networksin Artificial Intelligence, Lehmann,F., ed.
Oxford:PergamonPress.
Zarri, G.P. 1994. A Glimpseof NKRL,
the "Narrative
KnowledgeRepresentation Language". In Knowledge
Representation for Natural Language Processing in
Implemented Systems - Papers from the 1994 Fall
Symposium,
Ali, S., ed. MenloPark, Calif.: AAAI
Press.
Zarri, G.P. 1995. Representing and QueryingComplex
Conceptual Structures in the Frameworkof NKRL,the
"Narrative KnowledgeRepresentation Language". In
Supplementary Proceedings of the 3rd International
Conferenceon ConceptualStructures, Ellis, G., Levinson,
R., Rich, W., and Sowa, J., eds. Santa Cruz, Calif.:
Departmentof Computerand Information Sciences of the
Universityof California.
158
Download