From: AAAI Technical Report SS-97-02. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved. Natural LanguageIndexing of MultimediaObjects in the Context of a WWW Distance Learning Environment Gian Piero Zarri CentreNationalde la RechercheScientifique 54, boulevardRaspail 75270Paris Cedex06, France zarri@cam s.msh-paris.fr Abstract Allthe specialistsare agreedthat the possibilityof adding to multimedia objects somesort of "conceptual"annotation describingtheir information contentundera formsuitable for computer processingwouldcontributegreatly to solve the problem of their "intelligent"indexing andretrieval.The existingprojectsin this domain, like Information Manifold, UNTANGLE or MIHMA,propose to build up the conceptualannotation (or "generalised bookmark") makinguse of very complexconceptuallanguages,like somesort of descriptionlogics. Themajordrawback of this approach concernsthe difficultyof associating"byhand"a (syntacticallyverycomplex) generalisedbookmark in the descriptionlogics style to WWW objects pertainingto a multimedia universesufficiently large. Toalleviate this problem,wepropose-- in the contextof a newlystarted project whichconcernsthe establishmentof detailed blueprints(andof a prototype)of a full object-oriented, WWW DistanceLearningEnvironment -- to associatewith the WWW objectsnot the final conceptual annotation,but a simplenatural language(NL)captionin the formof short texts representinga general, neutral descriptionof the informational contentof the object.Toaccelerate this sort of operation,it couldalsobe possibleto make useof dictation systemslike DragonDictate or IBMVoiceType.TheNL caption should then be converted into a conceptual annotationin NKRL (NarrativeKnowledge Representation Language), making use of an automatic"translation"system like thosewehaverecentlyimplemented in the contextof projectspartially financedbythe European Union. The general framework of WebLearning The architecture of the WebLearningenvironment is structuredaroundthree mainbuildingblocks. The central kernel of WebLearningconsists of a multimedia object-oriented data managementsystem, characterised by the notions of objects, object encapsulation,classes, and inheritance(Bertino &Martino 1993). Its basic functions concern a) storing and retrieving the multimediaelements to be used for the assemblingof the learning material, and c) routing this material to the users. The multimedia elements are essentially texts, images,video, audioand formatteddata. Tofacilitate their retrieval, they will be annotatedusing "conceptualannotations",see the next Section. Thekernel must also be able to managecompound multimediaobjects, i.e., structured sets of elementspertaining to different media. Compound objects maybe "static" or "dynamic". Static compounds are created by linking together they constituent elementswith predeterminedconceptuallinks, like those used in the hypertextual systems. Dynamic compoundsare created whennecessary, and they are structured accordingto a temporaldimension.Thekernel concernsalso the management of a repository of distance learning material available on the server (LR, Learning Repository), realised as a collection of WWW pages allowinga free accessto this material. Thesecondbuilding block consists in a sub-environment specifically dedicatedto the "authoring"activities which will allow the creation of the learning material -- i.e., substantially, the creation, organisationandfeedingof the Learning Repository. Three main componentsmust be realised here. The first is a specialised WWW browser allowingthe retrieval on Internet of the informationand data to be usedfor the assemblingof the distance learning material-- in a first phase,it canbe simplya specialisation of the existing WWW search engines, like Lycos and Yahoo. The second component will consist in the specialisededitor allowingthe definition of the static and dynamiccompound objects. Thethird will consist in a set of tools to allow the productionof the final, conceptual annotation of the chosen (simple and compound) multimediaelements,see again the next Section. Thelast building block is a sub-environment supplying the student with the tools whichare neededto execute an "intelligent" retrieval of the learningmaterialandto allow its optimalutilisation. In particular, the use of this material must be personalised and, possibly, assisted throughthe recourse to exercises or tests. In this context, the main problemis, obviously,that of the correct retrieval of the annotatedmaterial ; a morespecific problemconcernsthe developmentof efficient tools to managea cache memory devoted to the storage of multimediadata and compound objects, and whichmustbe inserted in the client-node to whichthe user will be connected. 155 The annotation problem In the context of anyproject concerningthe retrieval of WWW objects, a central problemconcernshowto built up a powerfuland flexible indexingframework. In the WebLearning project, wehave chosento makeuse of somesort of generalised "access by content" -- in oppositionto the physically-basedtechniquesthat consist in retrieving data on the basis of some "external" characteristics of the supports,suchas, for the imagesand the videos, colour, shape, texture, motionpatterns, scene breaks, pauses in the audio, camerapans and zooms,etc. As well known,an "authentic" access by content is particularly difficult to envisagefor non-textualmaterial (images, video, audio). For example, some very encouragingexperimentsexist that allow to automatically analyse, to a certain extent, the external characteristics mentionedbefore, see, e.g., (MacNeil1991), (Mills, Cohen, & Wong1992), (Zhang, Kankanhalli, & Smoliar 1993). However, for the time being, this informationalone is not sufficient to allow the creation of a detailed representation of the semantic content of non-textual documents,that could be able to support their contentbasedretrieval androuting. In the past, indexing(and therefore access) by content has beenoften realised by using keywords.Thelimitations of this approach are well known.They range from the discrepancies in the choice of the keywordbetweenthose whobuild up the indexesandthe real users, to the fact that keywordscannot describe the complextemporal structure of video andaudio information,to the problems,finally, concerningorder, like the impossibilityof distinguishing, in a pure keywordapproach, betweenthe two situations, concerningthe first "an old manbiting a dog" and the second"an old dog biting a man".Consequently,several researchers have recently proposedto makeuse of some sort of structured, "conceptual"annotationdescribingin somedepth the information content of the WWW objects to retrieve, see projects like UNTANGLE (Welty 1994), MIHMA (Hoppeet al. 1996) or Information Manifold, (Kirk 1996). Thesethree projects realise the conceptual annotations (or "generalised bookmarks")makinguse description logics languages,see (Brachman et al. 1991). Such generalised bookmarksare then inserted in a knowledgebase that the user can consult by employing advancedretrieval and inference techniques-- instead of consultinga simple,unstructuredlist of pure bookmarks. In our opinion such approach has, however, a major drawback.Evenif a solution in the UNTANGLE’s style is adopted-- i.e., that of searchingfirst for interesting WWW pagesandother on-line informationusingthe "traditional" tools (search enginesetc.) and then, whensuch information is discovered, producing the correspondingconceptual annotations-- there is still the difficulty of creating "by hand"a (syntactically very complex)generalised bookmark in the description logic style for WWW objects pertaining to a multimedia universesufficiently large. Toalleviate this problem, wepropose to explore in WebLeaming a Natural Language(NL) approach that we consider as sufficient general, and not suitable only for our distance learning context. Wesuggest then to associate with the original WWN¢ objects or with those selected after retrieval through the searchengines,not the final conceptualannotation,but a simpleNLcaptionin the formof a short text representing a general, neutral description of the content of the document. Toacceleratethis sort of operation,it couldalso be possible to makeuse of dictation systemslike Dragon Dictate or IBMVoiceType.Beforewe definitely link the objects with (in our case, see the previous Section) the LearningRepository,the captions shouldbe convertedinto a conceptualannotations: weproposeto represent themin NKRL (Narrative KnowledgeRepresentation Language) terms, thanksto the automatic"translation" tools fromNL to NKRL that have been recently implemented in the context of two Europeanprojects : NOMOS, Esprit P5330, and COBALT, LREP61011. NKRL has beenextensively described in the literature, see, e.g., (Zarri 1992),(Zarri 1994),(Zarri 1995).It high-level knowledgerepresentation language endowed with particular features whichmakeit well suitable for representing descriptive meanings.Wecan mentionhere the presence,besides the traditional "hierarchy(ontology) of concepts", of an "ontologyof events" (a hierarchical organisation of templates representing the formal description of general classes of real-worldevents, like "movinga physical object" or "havinga particular attitude towardssomeone"),or the possibility of describing the temporalcharacteristics of the specific events (these last are represented as instances of the NKRL templates). Moreover,NKRL allows for the representation of implicit andexplicit enunciativesituations, of wishes,desires, and intentions,of plural situations, of causality, andof complex secondorder, intertwinedconstructions.In additionto the tools allowingfor the translation of original NLdocuments into their proper NKRLrepresentation, the NKRL technologyincludes also a set of tools for executingan "intelligent" exploitationof the conceptualrepresentation. Theseallow, inter alia, the retrieval -- by the automatic, semantictransformationof an user querythat failed -- of information which is conceptually (but not formally) equivalentto the informationoriginally searchedfor. Togive only a simple example,Figure 1 reproducesthe NKRL representation of a (fragment of) NLcaption like "Three girls are lying on the beach" that could be associated with a WWW image. 156 cl) EXIST SUBJ (SPECIF girl_l (SPECIF card inal ity_ 3)) (beach_) MODAL lying__position [ girl_l InstanceOf : girl_ HasMember : 3] Figure I - NKRL representation of an NLcaption. Thecaption is represented according to the rules for encoding"plural situations" in NKRL, see, e.g., (Zarri 1995). Therefore,an "occurrence"el, instance of a basic NKRL template, brings along the maincharacteristics of the event to be represented. The non-emptyHasMember slot in the "individual" girl_l, instance of an NKRL "concept"(girl_), makesit clear in turn that this individual is referring to several instancesof girl_. Please note that the italic type style is systematicallyusedto represent a concept_,while the romanstyle represents an individuaL. A "location attribute" (a list) is associated with the argument(role filler) of the SUBJ(ect)role in cl using the colon code, ":". lying__position(the predicate argumentintroduced by MODAL(ity)) and beach are both representedas generic conceptsgiving that no details are given about their possible, peculiar characteristics. The "attributive operator", SPECIF(ication),whichappears c l, is one of the NKRL operators used to build up "structured arguments" (expansions) of the NKRL conceptual predicates (EXISTin this case) ; SPECIF used to represent someof the properties whichcan be assertedaboutthe first elementof an expansion list. using someformof "controlled natural language"in order to inducean importantsimplification of the "translation" operations, see (Proceedings1996), etc, A further subject of investigation -- which, in the WebLearningproject, represents only a subsidiary direction of research -- consists in the idea of coupling NKRL, for conceptualannotation purposes, with an iconic visual languagein the MediaStreamstyle (Davis 1993). This should allow, for the non-textual documents,the use of conceptual descriptions of these documents(i.e., temporally-stamped NKRLtemplates) where the "ordinary" NKRL concepts will be mixed with the Media Stream iconic primitives. These last are groupedinto general classes like "characteractions", "object actions", "characters", "time", "space", "cinematography", "recording medium","transitions" : someof these Media Streamclasses .already exist in the NKRL hierarchical classification of concepts, and the integration of the residual ones in the NKRL concept hierarchy should not constitutea too difficult endeavour. References Basili, R., and Pazienza, M.T. 1996. MIDDLEMAN Inc.: Linguistic-based IR Tools for W3Users. In Proceedings of the WWW5 Workshop on Wecan makehere two additional remarks. Artificial Intelligence-based Tools to Help ¯ The first is the acknowledgementthat the idea of W3 Users (Paris, INRIA, May 1996, indexingand exploringa base of documents by associating, "http://www.info.unicaen.fr/-serge/3wia/workshopf’). in a first step, someNLcomments to these documentsis Bertino, E., and Martino, L. 1993. Object-Oriented certainly not a newone -- see e.g., in a specific WWW DatabaseSystems- Conceptsand Architectures. Reading, context, the recent (Basili &Pazienza1996). However, the Mass.: Addison-Wesley. bulk of the existing experimentsin this direction seemto Brachman,R.J., McGuinness,D.L., Patel-Schneider, fluctuate between very simple, low-level rule-based P.F., Resnick, L.A., and Borgida, A. 1991. Living with techniques makinguse of elementarysemantic categories CLASSIC: When and How to Use a KL-ONE-Like like those included in WordNet (Miller 1990) (a_kind_of, substance_of, part_of andmember_of fornouns, entails Language.In Principles of SemanticNetworks,Sowa,J.F., ed. San Mateo,Calif.: MorganKaufmann. andcauses_ forverbs, satellite of andpertains_to for adjectives), see(Chakravarthy 1994), andtheinference- Chakravarthy,A.S. 1994. TowardsSemanticRetrieval of intensive applications oftheextremely complex machinery Pictures and Video. In Proceedings of the AAAI-94 of CYCproposed by Lenat andhiscolleagues, see(Guha Workshopon Indexing and Reusein MultimediaSystems. MenloPark, Calif: AAAIPress. & Lenat 1994). TheNKRLapproach, seeagain theNKRL literature, seemtorepresent anacceptable compromise Davis, M. 1993. Media Stream: An Iconic Visual Languagefor Video Annotation. In Proceedingsof 1993 between these twoextreme positions. IEEESymposiumon Visual Languages, Los Alamitos, ¯ Thesecond consists in noticing thatan obvious Calif.: IEEEComputer Society Press. drawback of anyNL approach to automatic indexing Guha,R.V., and Lenat, D.B. 1994. EnablingAgentsto concerns thetheoretical impossibility, given theactual WorkTogether, Communicationsof the ACM37(7): 127"state oftheart"oftheNLPtechnology, ofsystematically 142. obtaining the"correct", automatic (conceptual) translation Hoppe,T., Kindermann, C., Paulus, K.O., Tolksdorf,R., ofanNLtext, even ofa very short one. This situation can, Buu,E., Heimann,S., Schmiedel,A., and Voile, P. 1996. however, bepragmatically improved thanks totheuse,ina The MIHMA Project: A WebInformation Service Based separate or co-operative way,of several well-known on Description Logics. In Proceedings of the WWW5 techniques : allowing theinclusion, in theknowledge Workshopon Artificial Intelligence-based Tools repository, of conceptual description of WWWobjects to Help W3 Users (Paris, INRIA, May 1996, which canbeincomplete andeven(partially) erroneous, "http://www.info.unicaen.fr/-serge/3wia/workshopr). making useoftechniques ofcomputer-assisted translation, Conclusion 157 Kirk, T. 1996. Knowledge BasedAccessto Information on the World WideWeb. In Proceedings of the WWW5 Workshopon Artificial Intelligence-based Tools to Help W3 Users (Paris, INRIA, May 1996, "http://www,info.unicaen.fr/~serge/3wia/workshop/"). MacNeil,R. 1991. GeneratingMultimediaPresentations Automatically using TYRO : The Constraint, Case-Based Designer’s Apprentice. In Proceedings of 1991 IEEE Workshopon Visual Languages. Los Alamitos, Calif.: IEEEComputerSociety Press. Mills, M., Cohen,J., and Wong,Y.Y.1992. AMagnifier Tool for Video Data. In Proceedings of CHI’92. New York: ACM Press. Miller, G.A. 1990. WordNet: An On-line Lexical Database,InternationalJournalof Lexicography 3(4). Proceedings of the First Workshopon Controlled LanguageApplications -- CLAW’96. Leuven: Centre for ComputationalLinguistics of the KatholiekeUniversiteit Leuven. Welty, C. 1994. Knowledge Representation for Intelligent InformationRetrieval. In Proceedingsof the CAIA-94Workshopon Intelligent Access to Digital Libraries (March1994). Zhang, H., Kankanhalli, A., and Smoliar, S.W. 1993. AutomaticPartitioning of Full-MotionVideo, Multimedia Systems,1:10-28. Zarri, G.P. 1992. The "Descriptive" Componentof a Hybrid Knowledge Representation Language.In Semantic Networksin Artificial Intelligence, Lehmann,F., ed. Oxford:PergamonPress. Zarri, G.P. 1994. A Glimpseof NKRL, the "Narrative KnowledgeRepresentation Language". In Knowledge Representation for Natural Language Processing in Implemented Systems - Papers from the 1994 Fall Symposium, Ali, S., ed. MenloPark, Calif.: AAAI Press. Zarri, G.P. 1995. Representing and QueryingComplex Conceptual Structures in the Frameworkof NKRL,the "Narrative KnowledgeRepresentation Language". In Supplementary Proceedings of the 3rd International Conferenceon ConceptualStructures, Ellis, G., Levinson, R., Rich, W., and Sowa, J., eds. Santa Cruz, Calif.: Departmentof Computerand Information Sciences of the Universityof California. 158