Week 1: Overview Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 OVERVIEW, GOALS OF CLASS Tools for documentation • Physical tools: • Hardware • Software • Stimuli • Conceptual tools: • What makes a good documentary corpus • Procedural tools: • How to go about documenting a language • Tools for disseminating results Overview • • • • Week 1: overview, hardware, software Week 2: elicitation techniques, grammar writing Week 3: narratives, conversation, corpus building Week 4: lexicon, archiving About the class • “How to describe/document a language” • *No practical component* (in that we won’t be working with speakers) • However, there will be time (I hope!) to talk about your own field data • And we will be doing some exercises with existing data • I will provide datasets for exercises (if you don’t have data of your own to use) • You can also use data from the field methods class here at the Institute. A few assumptions for this class • Not talking about community-oriented materials here (I see documentary materials as feeding into that though) • Assuming that the language doesn’t have a lot of other materials apart from what the linguist will be producing • Assuming that the linguist will be the one doing most of the writing. • Implicitly assuming a grammar/dictionary/texts model (more on this below). • None of these assumptions are crucial, they’re just there so we can limit the topic a bit. PRINCIPLES OF DOCUMENTATION What is language documentation? • Documentary Linguistics as its own subfield. • Doing things with linguistic data: • • • • Getting the data Preserving it Processing it (Analyzing it) • Cf Woodbury (2002): Language documentation is the creation, annotation, preservation, and dissemination of transparent records of a language. • Important for both theoretical and empirical branches of linguistics: • typology, historical linguistics, etc What shapes the language record? • The linguist (i.e. you!) • Their interests • Their abilities • The speakers and their interests! • External circumstances • • • • funding time available lucky breaks unlucky breaks Language Documentation as a Language Legacy • Particularly relevant for endangered languages. • Your work might be the only substantive record of a language: • few speakers • field might view the language as “done” • speakers might view the language as “done” Planned Documentation vs “Collect it all” • “making a record of the language” : ‘comprehensive grammar’ • You can’t collect everything. • All documentation is sampling. • Unstructured, unanalyzed corpora usually aren’t very useful • They are hard to use; • They don’t get worked on; • They usually aren’t big enough to test hypotheses computationally; • They require native speakers (or people who are already very familiar with the language) -> fine for languages with a major presence, but what about the quarter of the world’s languages with fewer than 10,000 speakers? What counts as documentation? • When is a collection big enough to count as language documentation? • Is an article in Linguistic Inquiry language documentation? • • • • creation annotation preservation dissemination • but only a very small fragment of a language. How much time/space does a documentary corpus take? • Depends on the resources: • • • • Time Speakers Money Levels of Interest Grammar, Dictionary, Texts • “The Boasian Trilogy” • Structure, Lexicon, Culture • Way to present the analysis and also allow others to recreate it (or challenge it) from the underlying data. • Conceived broadly: • Capture language structure • Capture language in use • Capture lexicon and meaning Sampling: Documentation as snapshots • A big part of documentation is constructing a good set of “samples”. • To do that, you will need to consider what the purpose of the documentary record is. That is, why are you collecting data on the language? • • • • • • • “to make a lasting record of the language” “to reclaim the language to future speakers” “to write a reference grammar” “to document the culture in the traditional language” “to investigate a particular aspect of the language” all of the above… … Sampling • Are your “snapshots” representative? • • • • • … Speakers Subjects/Topics Grammatical constructions Lexicon Planned versus opportunistic collection • Planned: • translated sentences. • grammaticality judgments • etc. • Unplanned (or planning gone wrong): • Speakers reinterpret your prompts and construct narratives from them. • New speaker comes to a session and wants to tell stories. • You find a new (to you) morpheme in your data and want to find out how it works. • You overhear a new construction in conversation. What constitutes a documentary corpus? • • • • • • • • • • • ***Everything*** sound files videos transcripts (elicitation prompts – part of the annotation) photographs maps (artifacts) metadata (data about the data) metametadata … WORKFLOW AND DATA TYPES Workflow: 1. What do you need to do to document a language? 2. What order do you need to do it in? 3. (How will you know if it’s been done right?) Scaled workflow • Project as a whole (timescale of years) • e.g. “Bardi language documentation” • Immediate tasks (timescale of weeks or months) • e.g. “Bardi learners guide” • Subtasks (timescale of days or weeks) • e.g. “write the section on numbers” • Data gathering (timescale of single session) • e.g. “get data on numerals in use” Workflow while on fieldwork HARDWARE Sample field kit: • Equipment: • • • • • Laptop Audio recorder Video recorder + microphones + backup means of recording (e.g. from laptop, second recorder) • Media: • backup devices [hard drive, DVDs, etc] • memory cards for recorders • paper! pens! • Other • • • • • ways of keeping the equipment clean carry bag stills camera (cell phone, ipad, etc) batteries, other power equipment tripod • Stimuli/research prompts Audio • The field has converged on solid state recorders using SD cards • Handy Zoom H2 or H4 (or H6 coming soon!) • Edirol R-09 • Marantz PMD 660 or 670 • And/or laptops • (or laptop plus external sound card/preprocessor) • • • • • small/portable AA batteries high quality, lossless formats easy to use easy to transfer data Not recommended: • Dictaphones • Cassette recorders • DAT Video • Less consensus on models • Major component of the documentation or side-project? • Options: • • • • smart phone ipad stills camera with video function dedicated video camera • SD card • mic jack • Problems: • mpeg vs other proprietary video formats • large files • memory-intensive Microphones • • • • • headset vs lapel vs meeting microphone dynamic vs cardioid wired vs wireless SLR vs 1/8” jack The built-in mics in the Edirol, Handy, etc, are also ok • You get what you pay for, approximately. • Remember that microphone placement and volume monitoring is much more important than the quality of the microphone (far more recordings are ruined through the former than the latter). Computer • • • • • Laptop Lots of memory Lots of hard drive space Usually don’t need ruggedization features Get cheapest possible and assume it won’t last for more than a season, or try for a higher end model • Special considerations for high altitude, high humidity, or low temperature work. • High altitude: hard drives fail: use solid state • High humidity: condensation issues • Low temperatures: battery issues (See Lanz 2010) Tablets? • Most language software won’t run on ipads or other tablets. • Great for stimuli, backup recorder, camera, etc. • Too much data Sample field kit: • Equipment: • • • • • Laptop Audio recorder Video recorder + microphones + backup means of recording (e.g. from laptop, second recorder) • Media: • backup devices [hard drive, DVDs, etc] • memory cards for recorders • paper! pens! • Other • • • • • ways of keeping the equipment clean carry bag stills camera (cell phone, ipad, etc) batteries, other power equipment tripod • Stimuli/research prompts