Archiving David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London 1 Topics 2 Introducing ELAR and digital language archives Preservation Archive interactions with documentation What and how to archive Protocol Metadata Evaluation of audio Archives and revitalisation Archivism : mobilisation Video Conclusions Introducing ELAR and digital language archives 3 Endangered Languages ARchive (ELAR) one of 3 semi-autonomous programs of the Hans Rausing Endangered Languages Project staff of 3; archivist, software developer, technician, (research assistants etc) develop preservation infrastructure, cataloguing and dissemination; policies; facilities; training and advice; materials development and publishing 4 What is a digital language archive? a trusted repository created and maintained by an institution with a commitment to the long-term preservation of archived material will have policies and processes for materials acquisition, cataloguing, preservation, dissemination, migration to new digital formats a collection of managed materials 5 What is archiving of language materials? preparing materials in a structured form suitable for long-term preservation creating long-term relationships it is not backup it is not dissemination/publication it should not impinge on good linguistic practice 6 What can a language archive offer? 7 Security - keep your electronic materials safe Preservation - store your materials for the long term Discovery - help others to find out about your materials Protocols - respect and implement sensitivities, restrictions Sharing - share results of your work, if appropriate Acknowledgement - create citable acknowledgement Mobilisation - create usable language materials for communities Quality and standards - advice for assuring your materials are of the highest quality and robust standards Kinds of language archives many cross-cutting classifications: Indigenous vs outsider, eg. Squamish Nation regional vs international, eg. AILLA, Paradisec; DoBeS, ELAR associated with research institute, eg. AIATSIS, ANLC granter-funded, eg. DoBeS, ELAR, OTA digital vs physical vs mixed, eg. DoBeS vs Vienna Sound Archive, ANLC 8 Potential users speakers and their descendants - up to 95% of users of UCB are community members depositors - to create or renew materials other researchers - comparative/historical linguists, typologists, theoreticians, anthropologists, historians, musicologists etc etc other “stakeholders”, eg educationalists journalists and the wider public 9 Archives networks and bodies Digital Endangered Languages and Archives Network (DELAMAN) ELAR, DOBES, ANLC, Paradisec, EMELD, LACITO, AIATSIS, AMPM (Maori) Open Language Archives Community (OLAC) others, eg. D-LIB http://www.dlib.org/ Open Archives Initiative 10 Digital archive architectures OAIS archives define three types of ‘packages’ ingestion, archive, dissemination: afd_34 afd_34 dfa dfadf dfa dfadf fds fdafds fds fdafds afd_34 dfa dfadf fds fdafds Producers 11 Ingestion afd_34 afd_34 dfa dfadf dfa dfadf fds fdafds fds fdafds Archive Dissemination Designated communities ‘Live Archives’ - architecture Boundary between depositors, users and archive: users add, update content; customise outputs afd_34 afd_34 dfa dfadf dfa dfadf fds fdafds fds fdafds afd_34 dfa dfadf fds fdafds Producers 12 Ingestion afd_34 afd_34 dfa dfadf dfa dfadf fds fdafds fds fdafds Archive Dissemination Designated communities The way we were ... eg 1993: ASEDA Aboriginal Studies Electronic Data Archive at AIATSIS Canberra (modelled on Oxford Text Archive) opportunistically collect and catalogue electronic materials that were at risk or not accessible 13 lexica grammars texts etc How things have changed .. 14 types of data (modalities and some genres) means of storage standardisation and metadata dissemination (most explosive) expanded into practice and workflow of linguists ELAR’s holdings ELAR currently holds about 45 deposits with a total volume of approx 1.1 TB. the average deposit is about 25 GB, however, the sizes vary widely, with a few much larger deposits. The median size is around 10GB we expect volume to nearly double over the next year see next slides for distribution of data types 15 ELAR holdings by data type data types for a representative sample (70%) of holdings data type by volume (MB) and number of files, sorted by volume 16 Data type Volume (MB) Files audio 360,411 6,312 video 208,995 895 image 28,592 2,221 msword 223 404 pdf 196 134 eaf 33 176 text 32 781 lex 9 29 trs 5 246 xls 1 19 imdi 1 26 If you are a depositor, ELAR will 17 preserve your deposited materials provide for making changes where possible provide web-based metadata management implement your access restrictions etc give feedback about materials provide advice, general and specific assistance, eg data conversion provide some equipment and services on a case by case basis, develop resources Preservation 18 Preservation issues 19 making materials robust making storage robust organisational, ownership and policy issues changing technologies refreshing migrating Changing technologies advantages of digital preservation primarily: copying items no longer unique also transmission, dissemination other implications robust formats (standard, open, explicit) formats with long horizons formats easy to refresh formats that don’t require particular software (sometimes software is intrinsic!) may have to describe software or even archive the software 20 Two preservation models “preserve the bytestream” keep the exact original at all costs LOCKSS “lots of copies keep stuff safe” http://lockss.stanford.edu/ guess which community it came from! 21 Some backup issues risk management undetected problems and useless backups aspects of professional backup: scheduled frequencies, eg monthly, weekly, daily retention media and locations naming/versions proven restoration 22 Top 10 worst ways to collect/manage data 23 1. No backup 2. Divergent versions of same data 3. Unlabeled disks/media 4. Non-standard or undocumented filenames 5. Master recordings used to review/analyse data 6. Don’t know how characters are encoded 7. Never tried to convert/export data 8. Unprocessed or unedited audio and video 9. Inconsistent recording 10. Unmonitored recording Archive interactions with documentation 24 Documenter and archive interactions 25 grant formulation and application communications, questions, advice training archiving services Documenter & archive interactions 26 Query/interaction topics analysis of approx 150 queries from documenters/linguists over nearly 2 years 27 What and how to archive 29 What can you archive (at ELAR)? media - sound, video graphics - images, scans text - fieldnotes, grammars, description, analysis structured data - aligned and annotated transcriptions, databases, lexica metadata - structured, standardised contextual information about the materials 30 Archive objects informed by traditions, eg document archives sometimes called “resources”, bundles it could be a file, a set of files, a directory, a “session” or a coherent item with many parts should have archival qualities eg Bird & Simons “7 Dimensions” (or see Thieberger in LDD2) may impose standard structures or formats need deposit event and processes 31 legal and protocol verification accession ongoing processes Archive objects should be selected example: video: How much volume allocated? answer: ... however, e.g.: unlikely that linguist is in position to plan and consistently create excellent video, so selection is unavoidable data has always been selected! 32 (... selection) in your typical work you also: selected labeled transformed/processed/edited added, corrected, expanded made links made or assumed relationships between “whole” and processed units; invented labels, IDs, scope etc imposed formats 33 Data portability Bird and Simons 2003: (for language documentation) our data should have integrity, flexibility, longevity and utility 34 Data portability 35 complete explicit documented preservable transferable accessible adaptable not technology-specific (also appropriate, accurate, useful etc!!) Formats - media - preferred sound - WAV image - BMP, TIFF, JPEG video - MPEG2 36 Formats - documents - preferred plain text, with or without markup PDF (PDF/A) XML, other systematic markup (with description of markup system) well-structured documents in common Office formats - ELAR will eventually convert them to archive formats character encoding : 37 preferred encoding is ASCII or Unicode clearly document any other encodings used, e.g. ISO 8859-5 discuss with us if you use font substitution to handle nonRoman characters Formats - characters - preferred character encoding : ASCII or Unicode (UTF-8) you must clearly document any other encodings used, e.g. ISO 8859-9 discuss with us if you use font substitution to handle non-Roman characters 38 Filenames and directories characters [A-Z], [a-z], [0-9], underscore and a single full stop before the extension correct MIME extension favour lower case letters maximum length 30 characters maximum directory depth 8 = ASCII only, no spaces 39 Semantics of filenames don’t stuff meaningful information into filenames - use metadata instead versions use directory structures wisely 40 Data format duty cycle examples Raw Video DVI Interchange Archive Dissemination softwarespecific MPEG-2 MPEG-2 MPEG2, AVI, QT Fieldnotes Shoebox Shoebox FOSF XML WWW, print dictionary Audio ATRAC WAV WAV BWF MP3 Complex data multiple FM Pro database RTF, XML XML Interactive application Multimodal multiple multiple as above as above Multimedia application page 41 Working Evaluation and conversion examples 42 Characters did my characters come through? answer: ... há pa ki hená mázaska however: perhaps ELAR should do it? 43 wikcémna nú pa iyóphewa-ye ks t DBW wóz?az?a-s?ni yeló DB OK wash things-NEG ASS.M 'he didn't do the wash' wóz az a-s ni yeló DB OK wash things-NEG ASS.M 'he didn't do the wash' Preservation Is my file preservable? Note: characters? inconsistent segmentation Text transcription: “Korimáka” data as comments Language: Choguita Rarámuri used for transcription: Spanish conventions/metadata Language Consultant: Luz Elena León Ramírez Linguist: abriela Cabaero Transcription: erth Fuen & Gabrela Cabaero Date recorded: 11/02/2006 Date tranbscribed: 11/02/2006 Recording: rec6-LEL.wav 44 Knowledge representation 1 - before wama momol chi naron mon chayako (LB) / wama momol chi naron chayako (MD) wama momol chi nan mon chayako (more emphatic(LB) / wama momol chi nan chayako (MD) Why don't you and him do it? + Notes have both of these sentences without the negator mon. OK runon naynangkroy ile ri He ate their sago. * kipin kannangkroy ngolu intended: We ate their cassowary. OK kipin kanangkroy ngolu We ate their cassowary. 45 Knowledge representation 1 - after <sentence.set num="75"> * kipin kannangkroy ngolu <version> intended: We ate their cassowary. <walman>Kipin kannangkroy ngolu</walman> <judgement>*</judgement> OK kipin kanangkroy ngolu </version> We ate their cassowary. <english>We ate their cassowary. </english> </sentence.set> <sentence.set num="76"> <version> <walman>Kipin kanangkroy ngolu</walman> <judgement>OK</judgement> </version> <english>We ate their cassowary.</english> </sentence.set> 46 Knowledge representation 2 avoid generic software “convert to XML” 47 <?xml version=“1.0” encoding=“UTF-8”?> <FMPXMLRESULT xmlns=“http://www.filemaker.com/fmpxmlresult”> <PRODUCT BUILD=“06/26/2002” NAME=“FileMaker Pro” VERSION=“6.0v2”/> <DATABASE DATEFORMAT=“M/d/yyyy” LAYOUT=““ NAME=“Videos” RECORDS=“13” TIMEFORMAT=“h:mm:ss a”/> <METADATA> <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Index name” TYPE=“TEXT”/> <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Image desc” TYPE=“TEXT”/> <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Date” TYPE=“TEXT”/> <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Content” TYPE=“TEXT”/> </METADATA> <RESULTSET FOUND=“13”> <ROW MODID=“16” RECORDID=“40”> <COL><DATA>Morly Beeta</DATA></COL> <COL><DATA>Interview with Morly Beeta</DATA></COL> <COL><DATA>Jan/13/05</DATA></COL> <COL><DATA>Obu history by Morly Beeta</DATA></COL> </ROW> ELAR conversion - original Language Dialects Speakers Place recorded Date recorded Recording name Duration Recorded by Recording equipment Translated by Transcribed by Reviewed and corrected by 48 Unangam Tunuu [Aleut Language] Qawalangin [Eastern Aleut] Nii}u}i{ [Western Aleut] Maria Turnpaugh, Nick Lekanoff, Clara Golodoff Unalaska, AK. Ray Hudson Room, Unalaska Public Library. 7.21.04 UNAK2trk1 16:21 min. Alice Taff Marantz CDR 300 recorder with one flat filtered table-mounted cardiod microphone. Also audio/video miniDV - Canon GL2. Alice Taff with Maria Turnpaugh 000-493sec. Millie Prokopeuff 455-499sec. Alice Taff Moses Dirks 129 ET Kamagala, afternoon afternoon 135 CG Aang yes 136 ET Sla{chxisaada{, ii? Nice weather. nice weather 140 CG Yeah. Maku{ that's all right 143 ET Alqutaadaltxichin? How are you? How are you all? ELAR conversion - XHTML <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”> <html xmlns=“http://www.w3.org/1999/xhtml” xml:lang=“en” lang=“en”> <head><title>ANC14trk1</title> <link href=“taff.css” type=“text/css” rel=“stylesheet”></link></head><body> <table class=“metadata”> <tr><td>Language</td><td class=“language”>Unangax̌ (Aleut)</td></tr> <tr><td>Dialect</td><td class=“dialect”>Niiĝuĝix̌ (Western Aleut)</td></tr> <tr><td>Speakers</td><td class=“speaker”>Alice Petrivelli, Vera Snigaroff, Mary Snigaroff, Vivian Koenig</td></tr> <tr><td>Place recorded</td><td class=“place”>Anchorage, Alaska </td></tr> <tr><td>Date recorded</td><td class=“date”>Mar. 15, 2005</td></tr> <tr><td>Recording name</td><td class=“rec_name”>ANC14trk1</td></tr> <tr><td>Recorded by</td><td class=“rec_by”>Alice Taff, Piama Oleyer</td></tr> <tr><td>Recording equipment</td><td class=“rec_equip”>Marantz CDR300 CD recorder with one flatfiltered, table-mounted cardioid microphone. </td></tr> <tr><td>Translated/Transcribed by</td><td>Simeon L. Snigaroff, December 2005</td></tr> </table> 49 ELAR conversion - XHTML <table class=“transcript”> <tr><td class=“time”>1</td><td class=“speaker”>ap</td><td class=“transcription”>Uqlaĝiix̌, x̌aayax̌, uqlaĝil agach aliguutax̌ ax̌.</td></tr> <tr><td>&nbsp;</td><td>&nbsp;</td><td class=“translation”>To take a bath, Steam bath, to take a bath is the one that is Aleut</td></tr> <tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr> <tr><td class=“time”>5</td><td class=“speaker”>vs</td><td class=“transcription”>uhmm</td></tr> 50 ELAR conversion - in browser Language Dialect Speakers Place recorded Date recorded Recording name Recorded by Recording equipment Translated/Transcribed by 1 ap 51 Unangax̌ (Aleut) Niiĝuĝix̌ (Western Aleut) Alice Petrivelli, Vera Snigaroff, Mary Snigaroff, Vivian Koenig Anchorage, Alaska Mar. 15, 2005 ANC14trk1 Alice Taff, Piama Oleyer Marantz CDR300 CD recorder with one flat-filtered, table-mounted cardioid microphone. Simeon L. Snigaroff, December 2005 Uqlaĝiix̌, x̌aayax̌, uqlaĝil agach aliguutax̌ ax̌. To take a bath, Steam bath, to take a bath is the one that is Aleut Delivery of materials mostly we expect to receive copies on computer-readable media such as hard disks or CD/DVD DVDs seem consistently unreliable some digitisation of media may be possible 52 Protocol 53 Protocol sensitivities, restrictions: identification, description and implementation 54 Protocol grows naturally with documentation focus on recorded data » more people, more genres, less researcher knowledge focus on revitalisation » which language to teach? who to host and teach? who can learn? etc community participation » framework for speakers to shape documentation process and products mobilisation » selecting, juxtaposing; community participation time » significance and sensitivities change over time access » increasing scope for dissemination, control of IP 55 ELAR Deposit Form “Section C” ELAR pays careful attention to any sensitivities or restrictions that apply to any part of your deposit. There are four ways that Access Protocol is implemented: you define permissions for the whole deposit or for individual files (or parts of files) we provide defaults to protect your data if you do not define permissions you/we keep permissions up to date you list other rights holders 56 ELAR Deposit Form “Section C” P1. Anyone Any person may view/listen to or receive a digital copy of any part of the deposit P2. Certain people or groups Choose any combination of P2A, P2B, and P2C: P2A Research community members What level of access (choose one only)? P2A1. They can receive a digital copy of requested material P2A2. They can view/listen but cannot receive a digital copy P2B. Language community members See below regarding identifying members What level of access (choose one only)? P2B1. They can receive a digital copy of requested material P2B2. They can view/listen but cannot receive a digital copy P2C. Particular named people or bodies See below regarding identifying people/bodies P3. Depositor is asked permission for each request You will be contacted and asked for permission on each request. How do you want to be contacted? P3A. Requester is given address to contact you directly P3B. ELAR will relay requests to you P4. Only the depositor has access Persons other than the depositor will not be able to request access. 57 ELAR Deposit Form “Section C” Identifying people/bodies If you chose P2B or P2C, tell us how ELAR should determine who is a member of a group (e.g. language community, educational body). Choose one of the following: M1. You tell ELAR how to determine membership (tell us in Part D) M2. ELAR will ask you on each occasion M3. ELAR will make a judgement about membership If you chose P2C, then list the names of the people or bodies in Part D. Contacting you If you choose P3A or P3B, you will be able to decide about each particular request. If the choice is P3A, we will send your address to the requester, who can then ask you directly for permission. You then send us your decision. If the choice is P3B, ELAR will act as an intermediary, and pass on the request to you, so that your privacy is maintained. However, if you chose one of P3A or P3B and you (or your delegate) are not contactable, ELAR will need to make the decision or change the access permissions. Similarly, if we need to contact you to ask about group membership, and you (or your delegate) are not contactable, we will need to make the decision or change the access permissions. 58 Other deposit, file or object-level protocol depositor-oriented we will provide means to change/manage protocol delegate other rights holders sunset clause 59 Metadata 60 Metadata Metadata the data about data that enables the management, identification, retrieval and understanding of that data reflects the knowledge and practice of data providers defines and constrains audiences and usages for data documentation’s data orientation heightens the importance of metadata 61 Metadata ELAR metadata set = selection from IMDI*, OLAC*, EAD, TEI ELAR-specific (e.g. protocol, geographical) depositor metadata * ie. a set of metadata elements that maps onto both IMDI and OLAC { { Archive Deposit 62 ELAR metadata set Your metadata All other files Types of metadata depositor's / delegates' details descriptive metadata administrative metadata preservation metadata access protocols metadata for individual files 63 Depositors and delegates 64 name address contact details (telephone, fax, email, URL) role affiliation date of birth nationality Descriptive metadata 65 title, description, subject, summary keywords subject Language, Community location time span Administrative metadata project details funding and hosting institutions details of external copies modifications and status details of accession agreement cf. deposit form 66 Preservation metadata carrier media formats, size provenance (source) access access protocols (see elsewhere) group membership identification 67 File-level metadata media files duration, file size MIME type, content type text files font, character set, encoding format, markup metadata files schema scope validity 68 Metadata formats common or standard: IMDI (‘ISLE Metdata Initiative’, from DoBeS) OLAC (Open Language Archives Community) EAD, and others ELAR: has created its own set, currently in implementation deposit-scope metadata in deposit form file level metadata (will be) by web form also, depositor’s own metadata 69 Metadata formats each depositor can also have different metadata! our goal: to maximise the amount and quality of metadata quality and extent is more important than standards and comparability many depositors are sending extensive metadata in a variety of formats including spreadsheets - see examples 70 What’s missing from metadata? pedagogy has typically been left out of the documentation agenda linguists are better at problematising languages than teaching them we should mobilise informed, effective and accountable pedagogy a Hippocratic imperative 71 Relationships relationships between documenters/ documentation and pedagogy nonexistent/poor cousin by-product documentation is a vector of language transmission! 72 Who could be documenters? 73 community members audio recordists videographers (documentary filmmakers) educators ethnobotanists anthropologists computer experts activists, missionaries linguists Multipurpose documentation? linguists of various specialisations anthropologists, historians, botanists ... do any have priority? who are documentation’s main beneficiaries? can we tell? 74 ... yes ... Metadata the data about data that enables the management, identification, retrieval and understanding of that data reflects the knowledge and practice of data providers defines and constrains audiences and usages for data 75 The key is metadata examples: IMDI, tiered morphological glossing etc standard (or “best practice”) metadata is strongly oriented to descriptive linguistics and typology (“aggregators”) How could metadata serve pedagogy? 76 Pedagogically oriented metadata demarcation, names and descriptions of socially/culturally relevant events such as songs (great interest to community members, and valuable teaching materials) should enormous amounts of time be spent providing morpheme-by-morpheme glosses if we cannot simply retrieve a song? 77 phenomena that provide learning domains, such as “numbers”, “kinship”, “greetings”, “tense” socially important phenomena such as register, code switching Pedagogically oriented metadata notes on learner levels links to associated materials that have explanations, examples notes on the previous selection and use of material for teaching notes on how to use the material for teaching notes and warnings about restricted materials or materials which are inappropriate for young or certain classes of people (e.g. profane, archaic etc) and of course easily findable basic information such as name of language or variety, speaker, gender, speaker’s country etc 78 Evaluating audio 79 Dobbin software for audio evaluation, processing and reporting 80 Dobbin 81 Dobbin 82 Dobbin 83 Dobbin 84 Dobbin 85 Dobbin 86 Archives and revitalisation 87 Keeping ‘means of transmission’ alive Romaine: co-ordinated efforts at revitalisation mean that institutions increasingly become the vector of language transmission, cf intergenerational transmission (Fishman) at the limit, documentations, and archives that foster, preserve, and disseminate them, become the means of transmission 88 Archives and revitalisation Penfield: toward a theory of documentation collaborative efforts onsite training document for revitalisation community-based protocols for the use of materials these have implications for the lifecycle of ‘data’ 89 Archivism 90 What have we missed? Woodbury: most developments are "what's been happening around the emergence of a documentary linguistics", particularly technology, which has raised expectations more than changed practices 91 What have we missed? Contact with wisdom and experience of established fields e.g. radio/broadcasting (eg mics, MD) cinematography (eg quality and specialisation) journalism (eg equipment handling) audio archives (linguists had input to IASA before 80s or so) 92 What did we get? advice about formats, parameters, what to avoid 'silver bullet' equipment and formats fundamentalism and format wars 93 Archivism Archivism: capitulation of language documenters to the agenda and priorities of archives and information technology why did this happen? for historical reasons rapid changes in technology we left a vacuum 94 Mobilisation 95 Mobilisation use of documentation resources to make relevant, useful, effective resources for language support and revitalisation 96 Gamilaraay/Yuwaalaraay song player uses ‘familiar’ data such as from Shoebox, Transcriber adds genre, functionalities, design etc 97 Song player data 98 <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Trans SYSTEM "trans-14.dtd"> <Trans scribe="elar" filename="YugalTrack33" version="1" version_date="050608"> <Episode> <Section type="report" startTime="0" endTime="87.445"> <Turn startTime="0" endTime="87.445"> <Sync time="0"/> \newsong14 [track33] music <Sync time="2.588"/> verse 1 line1 <Sync time="5.619"/> verse 1 line2 <Sync time="8.339"/> Song player data \song 34 [track28] \ti Gugan gaaynggul /Brown-skin baby \co Words and music: (c) Bob Randall \s Roger Knox \ln Gamilaraay \verse1 Dhayndalmuu ngaya dhurriyawaanhi dhayndalmuu ngaya dhurriya-y -waa-y -nhi priest I ride, -moving -Past s20148 m1590 m721 -m1733 -m1699 As a preacher I used to ride Yarraamanda binaal yarraaman -ga binaal horse -in,at,on peaceful m2020 -m755 m244 A quiet horse on the plains. 99 Walaaybaaga walaay -baa -ga nhama nhama that,the m1686 wagibaaga. wagibaa -ga plain -in,at,on s20467 -m755 gamila ngaya muurr gigi gamila ngaya muurr gi-gi -gi Song player data Chunking data: verses etc: [2,4,6,8,10,12,14,16,18,20,22,24] labels: [1:"Verse 1", 3:"Chorus", 4:"Verse 2", 6:"Chorus", 7:"Verse 3", 9:"Chorus", 10:"Verse 4", 12:"Chorus"] Play it 100 Other examples of ‘mobilisation’ Simple or conventional games etc can take on new significance Memory game play Crossword play 101 Video in documentation and archiving “Questioning the role of video in language documentation & archiving: is a moving picture worth 1,000 texts?” 102 The rise and rise of video increase in claims about video rise from about 25% to 75% of ELDP applicants funders have been demanding that some applicants make video 103 One size fits all? Himmelmann: the core of a language documentation, then, is constituted by a comprehensive and representative sample of communicative events as natural as possible. Given the holistic view of linguistic behaviour, the ideal recording device is video recording. 104 Goals and methodology of documentation cultural and cognitive aspects can be documented or augmented by video (examples from Harrison) counting methods/systems locative expressions behaviours or appearances of plants animals etc that are described as part of language-encoded knowledge: • information about plant toxicity and preparation could usefully be video • swimming formations (eg Marovo people of Solomon Islands who have rich set of terms for fish behaviour and its relationships to the calendar and hunting) • Gila Pima (Arizona) name a plum tree "dog's testicles", and an edible banana "looks like an erection" (umm, what will the videos show?) However, David Crystal estimates that such culturally/environmentally specific aspects are only about 10% of any languages’ content 105 Goals and methodology of documentation discourse and genre distinguishing participants (McConvell) transparently capturing “stories” (Wittenburg) adding or enhancing methodology stimulus materials the camera adds theatricality (Jukes) the camera as a participant (Atkins) enhance transcription through motivating community participation sign language work treat video as inscription cameras, lighting, orientation, clothing etc appreciated by communities 106 Goals and methodology of documentation documentation can’t aim to capture everything (Austin) and the video camera cannot either! argument for accountability has caused confusion between events and recordings. Result: fantasy that video is what happened and provides empirical evidence for all kinds of claims argument: video can do X => we should do video fails without goals and methodology for X many pro-video arguments could be equally applied to capturing other phenomena: e.g. palatography collecting other text-based metadata eg on social setting 107 Goals and methodology of documentation there must be different methodologies (linguistic AND video) for different purposes (cf. sign) Himmelmann: [each potential discipline’s usages] influence the recording and presentation of the data inasmuch as certain kinds of information are indispensable for a given analytical procedure (no phonetic analysis is possible without some high-quality sound recording, no analysis of gestures is possible without videotaping, etc.) 108 Goals and methodology of documentation so if there are distinct methodologies for different purposes how adequate could a generic video be? how can video serve purposes that documenters don’t have? 109 Goals and methodology of documentation explicit claimed purposes for video: in ELDP applications, many applicants request funds for video equipment but have no videorelated documentation goals and video exponents describe the potential of video but few documenters actually have these goals 110 Goals and methodology of documentation many phenomena can't be represented on video: 111 complex family structures and their terminologies changes in moon shape and phase (better as still photos or diagrams); other calendric and geographic expressions time and distance eg Tofa (Siberia) have words for the distance you can cover in a day on reindeer back morphological, grammatical and most lexical information (also relationships, staging, motivations, histories...) Video: a community oriented technology video is good for: community oriented content community involvement members will best know what/how to shoot skills transfer creating directly usable materials, including for revitalisation why should a linguist shoot video at all? 112 Video workflow and workload a disorder of magnitudes ... skills, workload, intrusion, volumes - all increase by orders of magnitude 113 skills - equipment, shooting, editing, production equipment - choice, usage, maintenance power supplies capturing, conversion annotation editing, production data volumes Workflow and workload annotation: could easily involve a time ratio of up to 100 (1 hour of video may take100 hours to process) in practice, most documenters do not annotate the phenomena that they did (or didn’t) identify fallacy that annotation etc can be done later • video amplifies the value of event-participant knowledge 114 Video: conclusions video can: add to the representational methods used by linguistics encourage us to look at diverse phenomena challenge our methodologies provide new and effective ways of disseminating language and cultural events and knowledge 115 Video: conclusions video and multimedia little encouragement to produce multimedia multimedia: • distinguishes medium from mode of knowledge representation • richer and more explicit interleaving of various types of knowledge • imposes its costs in more appropriate areas 116 Video: conclusions generic, amateur video fails to respect participants by not recognising linguistic specialisation, complexity or expertise to the same degree as “real” linguistic work naive video achieves “authenticity” mainly by not editing (and thereby not producing usable products!) 117 Video: conclusions there is a lot of tradition in evaluating the descriptive value of linguistic work, but little in defining the documentation value of video if video really represents the claimed range of linguistic phenomena, it is a key mode of documentation: documenters (and their teachers) need to pay much closer attention to its methodologies! it is not clear that it is linguists who should be making video 118 Conclusions 119 Conclusion: we ask depositors to manage materials well collect and provide protocol information deliver materials, metadata send trial samples etc not withhold materials share/manage/delegate custodianship of materials maintain relationships with language stakeholders and ELAR 120 Conclusion digital language archives combine traditional preservation with new ways of supporting creators and users of materials an archive can be more effective if materials are prepared as “portable” ultimately it is up to documenters to define what good documentation is ELAR welcomes you to discuss your archiving goals 121