Week 1 Intro-general - Historical and Pama

advertisement
Week 1: Overview
Tools for Language
Documentation
Claire Bowern
Yale University
LSA Summer Institute: 2013
OVERVIEW, GOALS OF CLASS
Tools for documentation
• Physical tools:
• Hardware
• Software
• Stimuli
• Conceptual tools:
• What makes a good documentary corpus
• Procedural tools:
• How to go about documenting a language
• Tools for disseminating results
Overview
•
•
•
•
Week 1: overview, hardware, software
Week 2: elicitation techniques, grammar writing
Week 3: narratives, conversation, corpus building
Week 4: lexicon, archiving
About the class
• “How to describe/document a language”
• *No practical component* (in that we won’t be working with
speakers)
• However, there will be time (I hope!) to talk about your own
field data
• And we will be doing some exercises with existing data
• I will provide datasets for exercises (if you don’t have data of
your own to use)
• You can also use data from the field methods class here at the
Institute.
A few assumptions for this class
• Not talking about community-oriented materials here (I see
documentary materials as feeding into that though)
• Assuming that the language doesn’t have a lot of other
materials apart from what the linguist will be producing
• Assuming that the linguist will be the one doing most of the
writing.
• Implicitly assuming a grammar/dictionary/texts model (more
on this below).
• None of these assumptions are crucial, they’re just there so
we can limit the topic a bit.
PRINCIPLES OF DOCUMENTATION
What is language documentation?
• Documentary Linguistics as its own subfield.
• Doing things with linguistic data:
•
•
•
•
Getting the data
Preserving it
Processing it
(Analyzing it)
• Cf Woodbury (2002): Language documentation is the creation,
annotation, preservation, and dissemination of transparent
records of a language.
• Important for both theoretical and empirical branches of
linguistics:
• typology, historical linguistics, etc
What shapes the language
record?
• The linguist (i.e. you!)
• Their interests
• Their abilities
• The speakers and their interests!
• External circumstances
•
•
•
•
funding
time available
lucky breaks
unlucky breaks
Language Documentation as a
Language Legacy
• Particularly relevant for endangered languages.
• Your work might be the only substantive record of a language:
• few speakers
• field might view the language as “done”
• speakers might view the language as “done”
Planned Documentation vs
“Collect it all”
• “making a record of the language” : ‘comprehensive grammar’
• You can’t collect everything.
• All documentation is sampling.
• Unstructured, unanalyzed corpora usually aren’t very useful
• They are hard to use;
• They don’t get worked on;
• They usually aren’t big enough to test hypotheses
computationally;
• They require native speakers (or people who are already very
familiar with the language) -> fine for languages with a major
presence, but what about the quarter of the world’s languages
with fewer than 10,000 speakers?
What counts as documentation?
• When is a collection big enough to count as language
documentation?
• Is an article in Linguistic Inquiry language documentation?
•
•
•
•
creation
annotation
preservation
dissemination
• but only a very small fragment of a language.
How much time/space does a
documentary corpus take?
• Depends on the resources:
•
•
•
•
Time
Speakers
Money
Levels of Interest
Grammar, Dictionary, Texts
• “The Boasian Trilogy”
• Structure, Lexicon, Culture
• Way to present the analysis and also allow others to recreate
it (or challenge it) from the underlying data.
• Conceived broadly:
• Capture language structure
• Capture language in use
• Capture lexicon and meaning
Sampling: Documentation as
snapshots
• A big part of documentation is constructing a good set of
“samples”.
• To do that, you will need to consider what the purpose of the
documentary record is. That is, why are you collecting data on
the language?
•
•
•
•
•
•
•
“to make a lasting record of the language”
“to reclaim the language to future speakers”
“to write a reference grammar”
“to document the culture in the traditional language”
“to investigate a particular aspect of the language”
all of the above…
…
Sampling
• Are your “snapshots” representative?
•
•
•
•
• …
Speakers
Subjects/Topics
Grammatical constructions
Lexicon
Planned versus opportunistic
collection
• Planned:
• translated sentences.
• grammaticality judgments
• etc.
• Unplanned (or planning gone wrong):
• Speakers reinterpret your prompts and construct narratives from
them.
• New speaker comes to a session and wants to tell stories.
• You find a new (to you) morpheme in your data and want to find
out how it works.
• You overhear a new construction in conversation.
What constitutes a
documentary corpus?
•
•
•
•
•
•
•
•
•
•
•
***Everything***
sound files
videos
transcripts
(elicitation prompts – part of the
annotation)
photographs
maps
(artifacts)
metadata (data about the data)
metametadata
…
WORKFLOW AND DATA TYPES
Workflow:
1. What do you need to do to document a language?
2. What order do you need to do it in?
3. (How will you know if it’s been done right?)
Scaled workflow
• Project as a whole (timescale of years)
• e.g. “Bardi language documentation”
• Immediate tasks (timescale of weeks or months)
• e.g. “Bardi learners guide”
• Subtasks (timescale of days or weeks)
• e.g. “write the section on numbers”
• Data gathering (timescale of single session)
• e.g. “get data on numerals in use”
Workflow while on fieldwork
HARDWARE
Sample field kit:
• Equipment:
•
•
•
•
•
Laptop
Audio recorder
Video recorder
+ microphones
+ backup means of recording (e.g. from laptop, second recorder)
• Media:
• backup devices [hard drive, DVDs, etc]
• memory cards for recorders
• paper! pens!
• Other
•
•
•
•
•
ways of keeping the equipment clean
carry bag
stills camera (cell phone, ipad, etc)
batteries, other power equipment
tripod
• Stimuli/research prompts
Audio
• The field has converged on solid state recorders using SD cards
• Handy Zoom H2 or H4 (or H6 coming soon!)
• Edirol R-09
• Marantz PMD 660 or 670
• And/or laptops
• (or laptop plus external sound card/preprocessor)
•
•
•
•
•
small/portable
AA batteries
high quality, lossless formats
easy to use
easy to transfer data
Not recommended:
• Dictaphones
• Cassette recorders
• DAT
Video
• Less consensus on models
• Major component of the documentation or side-project?
• Options:
•
•
•
•
smart phone
ipad
stills camera with video function
dedicated video camera
• SD card
• mic jack
• Problems:
• mpeg vs other proprietary video
formats
• large files
• memory-intensive
Microphones
•
•
•
•
•
headset vs lapel vs meeting microphone
dynamic vs cardioid
wired vs wireless
SLR vs 1/8” jack
The built-in mics in the Edirol, Handy, etc, are also ok
• You get what you pay for, approximately.
• Remember that microphone placement and volume
monitoring is much more important than the quality of the
microphone (far more recordings are ruined through the
former than the latter).
Computer
•
•
•
•
•
Laptop
Lots of memory
Lots of hard drive space
Usually don’t need ruggedization features
Get cheapest possible and assume it won’t last for more than
a season, or try for a higher end model
• Special considerations for high altitude, high humidity, or low
temperature work.
• High altitude: hard drives fail: use solid state
• High humidity: condensation issues
• Low temperatures: battery issues (See Lanz 2010)
Tablets?
• Most language software won’t run on ipads or other tablets.
• Great for stimuli, backup recorder, camera, etc.
• Too much data
Sample field kit:
• Equipment:
•
•
•
•
•
Laptop
Audio recorder
Video recorder
+ microphones
+ backup means of recording (e.g. from laptop, second recorder)
• Media:
• backup devices [hard drive, DVDs, etc]
• memory cards for recorders
• paper! pens!
• Other
•
•
•
•
•
ways of keeping the equipment clean
carry bag
stills camera (cell phone, ipad, etc)
batteries, other power equipment
tripod
• Stimuli/research prompts
Download