Current Trends in Documentation of Endangered Languages Peter K. Austin ELAP, Department of Linguistics SOAS Thanks to Oliver Bond, Lise Dobrin, Lenore Grenoble, David Nash David Nathan for discussion of the ideas in this presentation; they are absolved of responsibility for errors Outline Documentary linguistics and language documentation Components and skills for documentation Some current issues and future concerns Conclusions Documentary linguistics new field of linguistics “concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties” (Himmelmann 1998, 2006) has developed over the last decade in large part in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them, fuelled also by developments in information and communication technologies essentially concerned with role of language speakers and their rights and needs Features of documentary linguistics Himmelmann (2006:15) identifies important new features of documentary linguistics: Focus on primary data – language documentation concerns the collection and analysis of an array of primary language data to be made available for a wide range of users; Explicit concern for accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected; Concern for long-term storage and preservation of primary data – language documentation includes a focus on archiving in order to ensure that documentary materials are made available to potential users into the distant future; Work in interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to linguistics alone; Close cooperation with and direct involvement of the speech community – language documentation requires active and collaborative work with community members both as producers of language materials and as coresearchers. A contrast language documentation: activity of systematic recording, transcription, translation and analysis of the broadest possible variety of spoken (and written) language samples collected within their appropriate social and cultural context language description: activity of writing grammar, dictionary, text collection, typically for linguists Ref: Himmelmann 1998, Woodbury 2003 Uses of documentation documentation outputs are multifunctional for: linguistic research - phonology, grammar, discourse, sociolinguistics, typology, historical reconstruction folklore - oral literature and folklore poetics - metrical and music aspect of oral literature anthropology - cultural aspects, kinship, interaction styles, ritual oral history, and education - applications in teaching language revitalisation Users of documentation collection, analysis and presentation of data useful not only for linguistics but also for research into the socio-cultural life of the community analysed and processed so it can be understood by researchers of other disciplines and does not require any prior knowledge of the language in question usable by members of the speaker community respects intellectual property rights, moral rights, individual and cultural sensitivities about access and use and is done in most ethical manner possible The documentation record core of a documentation project is usually understood to be a corpus of audio and/or video materials with transcription, multitier annotation, translation into a language of wider communication, and relevant metadata on context and use of the materials the corpus will ideally be large, cover a diverse range of genres and contexts, be expandable, opportunistic, portable, transparent, ethical and preservable (Woodbury 2003) as a result documentation is increasingly done by teams rather than ‘lone wolf linguists’ need to see grammatical analysis and description as a tertiarylevel activity contingent on and emergent from the documentation corpus Phases in documentation project Project conceptualisation and design Establishment of field site and permissions Funding application Data collecting and processing (including archiving) Creation of outputs Monitoring, evaluation and reporting Phases in data collection and analysis Recording – of media and text (including metadata) Capture – analogue to digital transfer Analysis – transcription, translation, annotation, notation of metadata Archiving – creating archival objects, assigning access and usage rights Mobilisation – publication and distribution of materials Some current issues and challenges Documentation versus description The ‘representative’ record Quality of language documentation Commodification Interdisciplinarity Training for language documentation Communicating with the wider world Documentation vs description Himmelmann and others have tried to distinguish language documentation from language description, but it is unclear whether such a separation is truly meaningful, and even if it is where the boundaries between the two might lie. Documentation projects must rely on application of theoretical and descriptive linguistic techniques, if only to ensure that they are usable (i.e. have accessible entry points via transcription, translation and annotation) as well as to ensure that they are comprehensive. It is only through linguistic analysis that we can discover that some crucial speech genre, lexical form, grammatical paradigm or sentence construction is missing or under-represented in the documentary record. Without good analysis, recorded audio and video materials do not serve as data for any community of potential users. Similarly, linguistic description without documentary support is sterile, opaque and untestable. The “representative” record On a theoretical level, once can define “representative” documentation as the collection of sample texts of all discourse types, all registers and genres, from speakers representing all ages, generations, socioeconomic classes, and so on. On a practical level, however, there are concrete limitations to the range and number of texts which can be collected, transcribed and analysed. Most linguists cannot devote their entire careers to time in the field, which would be required for a truly thorough collection and analysis of data. A solution (proposed by Siefart in LDD 5) is sampling, ie. identification of some subset of types that is representative of the language as a whole – but how do we do this in a meaningful way: (i) for an individual language (ii) cross-linguistically in a comparable manner? Sampling criteria Criteria for differentiation of communicative events: “Ways of speaking“ as distinguished in specific culture / speech community (Ethnography of Communication) Medium: spoken / written Plannedness: unplanned / planned Register: formal / informal Manner of obtaining data: spontaneous (‘natural’) vs. elicitation vs. stimulated Target: child-directed / adult-directed / foreignerdirected It is clear that the success of a documentation project rests on intimate collaboration with community members. In the ideal, they can be trained to be engaged in data collection themselves, thereby expediting the process (eg. Florey 2004). Even if this is not possible, community members can direct (external) linguists to varying discourse types and to differing speech patterns. Note however that this could result in focus on rare/unusual/unique discourse types that were in no sense ‘representative’ Himmelmann (2006:66) identifies five major types of communicative events ranged along a continuum from unplanned to planned (next slide) however it is not clear that this typology is applicable to all languages and all speech communities – just what is a ‘representative’ account of language in use remains unclear, and perhaps should be abandoned Himmelmann genres Parameter Major Types Examples Unplanned exclamative Ouch! Fire! Jishin da! directive Scalpel! Sit! Achi ike! conversational greetings, small talk, chat, discussion, interview monological narrative, description, speech, formal address ritual prayer, ceremonial address Planned Quality of documentation There is a tendency among some researchers to equate documentation outcomes with archival objects (part of what David Nathan has termed ‘archivism’), that is, the number and volume of recorded digital audio and/or video files and their related transcription, annotation, translation and metadata. Mere quantity of objects is not a good proxy for quality of research. Equally, some would argue that outcomes which contribute to language maintenance and revitalization are the true measure of the quality of a documentation project (what better success of an endangered language project than that the language continues to be used?). So how could we measure ‘quality’ of a documentary corpus? What parameters might be included? Possible metrics volume (quantity) as a proxy form media – audio, video, stills – how measured? text – explicit, transparent, well-structured, standardised, richly detailed, machine-readable links (relations, hypertext, multimedia) – explicit, well-structured, machine readable More possible metrics content: new – never inscribed before unique – not readily replicable interesting … organisation and management (workflow, transformations, archiving) relevance and use of outputs for stakeholders impact on community of speakers (or other stakeholders) impact on future of language Commodification reduction of languages to things and their treatment as if they were a tradeable commodity reflected in language documentation through the transformation of languages into bounded objects, indices, technical encodings, and exchangeable goods results from forces of objectification, standardisation and audit that shape the management of information in contemporary Western culture, especially academic culture with its focus on outputs and counting (eg. RAE, RQF, citation indices, research impact statements etc) also reflects a theoretical and methodological vacuum that has been filled not by linguistics but by preservationists, archives and technologists Languages as bounded objects selections of phenomena crystalised into a singular “language” languages placed within boundaries, on maps etc. Languages as indices language vitality indicators: Unesco defines 9 criteria with 6 scoring levels; SIL uses 8 indicators these objectify languages: the vitality of an individual language can be quantified, and languages can be ranked according to degree of endangerment Unesco presents a deterministic relationship between the 9 factors and the vitality and function of languages: “taken together, these nine factors can determine the viability of a language, its function in society and the type of measures required for its maintenance or revitalization” Languages as exchangeable goods goal of research is for languages to be ‘preserved’ as ‘resources’ that ‘consumers’ (linguists et al) discover and access via ‘service providers’ (OLAC publicity) linguists’ professional obligations to speaker communities now often formulated in grant applications and elsewhere in terms of transacted objects (language primers, CDs, books) rather than knowledge sharing, joint engagement in language maintenance activities or other interactions granting agencies require linguist’s bona fides to be distilled into a ‘letter of support’ from ‘an appropriate representative of the language community’ thus turning a complex of social and political dynamics into an object that is used to legitimise the research Languages as technical encodings quantifiable properties (recording hours, data volume, file parameters) and technical desiderata (‘archival quality’, ‘portability’, standardised ontologies) have become reference points in discussing and assessing the methods and goals of documentation results in grant application by formula: 100 hours of 16 bit 44.1MHz audio, 25 hours MPEG-2 video, 10% ELAN .eaf files and Toolbox annotations technical parameters replace balanced discussion of documentation methods; eg. video recordings proposed without reference to hypotheses, goals or methodology; avoidance of data compression substitutes for knowledge of art of audio recording; file formats named rather than corpus structure described Interdisciplinarity Himmelmann and others have pointed to the importance of taking a multidisciplinary perspective in language documentation and drawing in researchers, theories and methods from a wide range of areas, including anthropology, musicology, psychology, ecology, applied linguistics etc (see Harrison 2005, Coelho 2005, Eisenbeiss 2005). True interdisciplinary research, is difficult to achieve, both because of theoretically different orientations, and practical differences in approach (ranging from differences in linguists’ and anthropologists’ practices concerning payments for consultants traditionally have differed, to more significant differences in academic paradigm that make communication and understanding fraught). Mainstream linguistics has tended to turn away from other disciplines and to emphasise its ‘independence’ by concentrating on theoretical concerns that are of internal interest to linguists only (minimalism, OT phonology – see Libermann 2007). Documentary linguistics opens new doors to interdisciplinary collaboration but we need to work out how to achieve it. Reaching the wider world There are great opportunities for communicating about language and language issues to the general community At SOAS we have run “Endangered Languages Week” in 2007 and 2008, film showings, public lectures, exhibitions (“Disappearing Voices”), David Crystal’s play (“Living On”) We see part of your work as ELDP grantees as including outreach and communication activities – we will encourage you to contribute “stories” and images for things like the HRELP annual report, the website etc. Exhibition Identifying the gaps The discourse of endangered languages and language documentation has a strong moral and emotional power which has not been matched by conceptual guidance on what linguistics and linguists can do in response publications and debates about effective and appropriate documentary methodologies for linguists have been slow to develop, resulting in many unanswered questions: are the goals of documentary linguistics social or formal? are its data symbolic or digital recordings of events? what role(s) should archives play? how could we decide between competing interests? we lack a framework for assessing quality, value, effectiveness and progress of our work so documentary linguists fall back on established patterns like quantifiable indices and technical standards Setting some agendas recognising that some of the challenges described here derive from bureaucratic and technological contexts and should not be taken for granted as defining the discipline we need to develop a new approach to language documentation that implements the moral and ethical vision that has attracted new participants replacing the rhetoric that documentation is a separate discipline from descriptive linguistics with a better understanding of their respective goals, methodologies and evaluative criteria and locating documentation within a wide range of interdisciplinary approaches to human language with development of appropriate training and outreach Our goals for the training course To expose you to good practices in documentation (recording, analysis, archiving, mobilisation, ethics and IPR) To raise issues that we see as theoretical and practical challenges and to share experiences and ideas (a twoway process To begin what we hope is a long-term on-going relationship between you as researchers and us as trainers, archivists, researchers and all round good guys The end