Towards a multimedia encyclopaedic lexicon for the Marquesan and Tuamotuan languages Gaby Cablitz Christian-Albrechts-Universität zu Kiel Overview of this talk Motivation for our project Why multimedia dictionaries? Project objectives and basic design Some major developments for our project Examples of linking multimedia extensions with lexicographic data Web-based collaboration with speech communities Motivation for the project How can a language documentation be made more accessible and usable to the speech and research community? Two problems: 1. Limited ways of structuring archive 2. Primary data do not reveal much about language structure and relatedness between words of a language Annotation of multimedia documents shows meaning of word in specific contexts, not network of associations between words nor full range of meanings -> need for structural data to understand primary data Role of lexicography backgrounded in DoBeS-program Dictionaries are necessary elements in language documentation projects Multimedia dictionaries: beyond traditional lexicography and language archiving New ways of meaning presentation: Linking of linguistic information with media files (video clips, photos, drawings, sound files) Multimedia extensions provide: -> information on pragmatics of lexical units (use in context) -> information on cultural knowledge related to meaning and use of lexical units (LU) -> non-verbal aspects of cultural activities relevant for understanding concepts encoded by LU New form of archiving: dense network of lexical entries with all kinds of media and archive files Moving from a conventional dictionary towards an encyclopaedia Major project objectives Major objectives: 1. Create multimedia encyclopaedic lexicon for Marquesan and Tuamotuan languages, 2. Advance development of LEXUS, 3. Involve speech community actively in lexicon creation via web-based collaboration Upload non-archived multimedia data with lexical database in LEXUS as possible (photos, drawings, photo galleries, etc.) Create links between lexical, multimedia and archive data in a thematically organised way Represent data by reflecting indigenous categorisation and understanding of relatedness between elements Create a database which is useful for language maintenance and language revival Design focus: creating thematically organised spaces Creation from an ethnobotanical perspective Plants important in traditional material culture, natural way of teaching traditional knowledge Linking of data shall be visualised in one space which allows continuous navigation through the database Some major software developments for our project purposes Improvement of UI issues, functionalities etc. Development of the ViCoS tool: key feature creating for thematically organised spaces via relational links Unlike the Kirrkirr software, ViCoS can also integrate multimedia data, has good navigation and visualisation solutions, parts of a photo or drawing can be selected, userfriendly way of creating relational links (drag&drop option, etc.) making it accessible for speech communities Realisation in ViCoS Realisation and navigation in ViCoS Jump to photo gallery Linking media with lexicographic data: corpus-based examples Edition of corpusbased example sentences -> creating a resource for comparing spoken vs. written language Link to archive Linking media with lexicographic data: made-up example sentences Link to archive with interlinearisation Video clips: acting out meaning of motion verbals Documenting word meaning Letting consultants design and act out word meaning without verbal interaction Supportive element of word meaning, also useful for language revival Creation of semantic word fields (e.g. CUT or BREAK verbals) in ViCoS Web-based collaboration with speech community Problematic aspects of web-based collaboration with SC Requirements for web-based collaboration with speech community (e.g. capacity building) Problems of using a wiki-like lexicon tool Proposal for speech community-based participation in the process of lexicon creation Basic challenges for an online cooperation with speech community Current state of LEXUS and proposal of collaborative WSs have wiki-like set-up based on consensus Who is a suitable administrator/primary editor? Is it really sufficiant to make a web-based tool available and assume that an encyclopaedic lexicon will be simply created in a wiki-like manner by the speech community? Design of collaborative WS by speech community Panel of moderators interacting with administrator and SC Complicated system of collaborative WS, not realistic Development and implementation is timeconsuming Organising, editing and revising large amounts of new data with multiple entry writers and multiple drafts can get out of control Community-internal obstacles I: linguistic situation In context of endangered speech communities -> wiki-like set up of collaborative WS is very problematic Documentation of lexical and cultural knowledge not an easy task -> consultants do not share same metalinguistic and cultural/ encyclopaedic knowledge about words (Haviland 2006) Indigenous Polynesian languages -> undergoing rapid linguistic change Depending on age and upbringing -> metalinguistic knowledge very heterogenous Community-internal obstacles II: culture-specific reasons Problem rooted in their traditional society: very secretive about their culture, transmission of cultural knowledge not public affair -> often only one selected person within a family Unlike western cultures, cultural knowledge has no open verifiable and codified standards Continuous loss of linguistic and cultural heritage feeds into many insecurities of speakers -> ground for conflicts about what is authentic knowledge and what not -> results in „editing wars“? Within speech community: accusations of re-inventing and transforming the language and culture, knowledgable speakers often stigmatised as „liars“ -> withdrawal from documenting their endangered linguistic and cultural heritage Community-internal obstacles III: cultures with oral traditions No writing tradition, difficult to motivate literate speech community members to express knowledge in writing Most knowledgable community members often cannot read or write, total lack of IT skills Recording is better way of fixing knowledge Transmission of traditional knowledge still „observing and learning by doing“ Capacity building in the speech community Prerequisite: substantial training in basics of lexicography and usage of linguistic software Understanding lexicon structures (e.g. Toolbox) requires training and continuous familiarisation as well as constant repetition of usage over protracted period of time Writing definitions, encyclopaedic articles and example sentences needs to be learned despite a simplified user interface New participants of speech community have to be trained subsequently -> who does the training? Psychological barriers Native speakers feel lost when having to edit lexical entries on their own Psychological blockade of writing lexical entries -> formal aspect of lexical entry structure puts pressure on contributors to do a good job Older community members have to learn to cooperate with younger community members with good IT skills, but lack of knowledge about language and culture Enrichment of lexicon with linguistic and encyclopaedic knowledge Sensitivising speakers for the difference between describing the meaning of word/lexical unit (=definition) and writing an encyclopaedic article -> encyclopaedic knowledge can be part of word meaning, lexical units can denote complex phenomena and procedures or culture-specific activities Enrichment of lexicon with linguistic and cultural knowledge still best achieved during fieldwork periods based on mutual dialogue between researcher and consultants -> detailed investigations about language and culture, pickingup on interesting comments, questions about grammar etc., semantic relations between lexemes, etc. best obtained in faceto-face communication -> miscommunications and misunderstandings can be instantly clarified New proposal for online participation by SC Both communities would like to have a limited „panel of moderators“ interacting with linguist Only reduced editing possibilities for speech community Lexicon should be open to community with reading rights only „Whiteboarding tool“ should be available coupled with the LEXUS tool -> informal editing possible Editing lexicon with whiteboarding tool Twiddla Web-based tool, access to websites, easy editing possibilities, edited page can be saved and sent as attachment Web-based whiteboarding tool ReviewBasics Comment, annotate, markup images, documents and videos, upload other media files, etc. User-friendly UI for editing documents and handling web-based collaboration Disadvantage: cannot access protected websites Advantages of whiteboard editing Informal way of editing lexicon and participating in its creation -> motivating effects on speech community Pressure of producing good definitions, encyclopaedic articles, etc. is taken away, no need to deliver complete definitions, etc. Playful aspect motivates younger speech community members to participate, consequently learn about their language and culture No interference with lexical database as such, only in accordance with moderators Workload reduced for the panel of moderators (accept or reject changes) Conclusions Web-based tool like LEXUS can be a powerful tool of 1. Linguistic and cultural revival 2. Tool for visualising primary and structural data together (e.g. lexicon) -> new form for archiving making linguistic and cultural networks more visible in KS of ViCoS Online participation of (Marquesan andTuamotuan) speech community is problematic if LEXUS is set up in wiki-like manner LEXUS needs to be adjusted to culture-specific circumstances of the speech communities Simplified user interface for SC will not solve the problem of online participation, contributors still need to learn basics of lexicography Enriching a lexicon with detailed linguistic and encyclopaedic information by online participation of SC is doubtful and will not replace extensive fieldwork