Proposal_v1 - Research Data Alliance

Improving Access to Recorded Language Data Simon Musgrave (Monash University, Australian National Corpus) Linda Barwick (University of Sydney, PARADISEC) Michael Walsh (University of Sydney, AIATSIS) Researchers in different disciplines collect and store data which includes human language recorded in real time. Discovery of such data should be easy across disciplines, but is currently impeded by different disciplinary approaches and standards. For example, a linguist may have collected recordings of songs performed by speakers of the language they are studying; these recordings are stored in an archive intended primarily for other linguists, and a musicologist may not easily discover the resource even though it might be very relevant to their research. This paper will discuss the work of a recently formed Working Group within the Research Data Alliance which aims to address this problem by working towards standardisation of metadata elements in two areas: codes for identification of languages and language varieties, and categories for describing the content of resources. For language identification, ISO639-3 provides a set of three letter codes to identify languages. But this is not unproblematic for a variety of reasons. Firstly, it is not adopted everywhere; for example the digital collections of the Australian Institute of Aboriginal and Torres Strait Islander Studies use a different set of identifiers and this example also shows two other problems for language identification. The divisions recognised by ISO639-3 do not always align with expert understanding. This has been a particular issue for Australian languages, with a number of change requests filed with the registration authority for ISO639-3. A number of these changes relate to delineating languages from linguistic entities below that level (such as dialects) and above that level (such as macrolanguages and language families). Proposals for identification of entities at different levels of granularity are being considered within the ISO process; the Working Group aims to ensure that expert input to these processes is maximised, that the principles underlying the ISO639 standard sets have a sound linguistic basis, and that registration and revision processes are consistent and transparent. Our assumption is that progress with these issues will lead to more consistent use of the standard by archives and repositories. Existing metadata schemas (e.g. IMDI, OLAC) include a vocabulary for describing the genres represented in linguistic resources, but these do not necessarily correspond to needs of different disciplines. Consultation across different research communities is needed to establish the range of resource types which need to be covered and vocabularies for describing that range. The Working Group will implement the results of this consultation by creating a set of metadata elements within the frameworks of the Component Metadata Initiative (CMDI) and the ISOCat data category registry. CMDI allows for the use of common metadata elements across different sites without imposing a rigid metadata scheme, while the ISOCat framework ensures that the semantics of (meta)data elements are explicit and accessible. We hope that the activities of the Working Group will lead to improved discovery and access for researchers across disciplines who work with recorded language data as well as improved possibilities for inter-repository data exchange. Biographies: Simon Musgrave is a lecturer in the School of Languages, Cultures and Linguistics at Monash University. Previously, he was a post-doctoral researcher at Leiden University and an Australian Research Council post-doctoral fellow at Monash. His research interests include Austronesian languages, language documentation and language endangerment, African languages in Australia, communication in medical interactions, the history of English in Australia, and the use of technology in linguistic research. He is also involved in the Australian National Corpus project, serving on the steering committee from an early stage as well as being the treasurer of Australian National Corpus Inc. Linda Barwick is Associate Dean, Research, at the Sydney Conservatorium of Music. She is an ethnomusicologist, specialising in the study of Australian Indigenous and immigrant musics, and the digital humanities (particularly archiving and repatriation of ethnographic field recordings as a site of interaction between researchers and cultural heritage communities). She has studied community music practices through fieldwork in Australia, Italy and the Philippines. Themes of her research include analysis of musical action in place, the language of song, and the aesthetics of cross- cultural musical practice. She has also published on theoretical issues, including analysis of non- Western music, and research implications of digital technologies. Michael Walsh's research has focussed on the Top End of the Northern Territory over the last 30 years. This research includes descriptive and typological studies of Aboriginal languages as well as investigations into language use among indigenous Australians. An interest in lexical semantics has given rise to such studies as one on body part metaphors and another on nominal classification. Outside of strictly linguistic matters he has carried out research or advised on land claims, assessment of Aboriginal witnesses in legal settings and Native Title matters. One spin-off of these interests is a focus on cross-cultural communication problems between indigenous and other Australians.

Proposal_v1 - Research Data Alliance

Related documents

Products

Support

Proposal_v1 - Research Data Alliance

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib