PPT - Hans Rausing Endangered Languages Project

advertisement
Questioning the role of video in
language documentation & archiving:
is a moving picture worth 1,000 texts?
ELDP training March 2010
David Nathan
djn@soas.ac.uk
Endangered Languages Archive
School of Oriental and African Studies
University of London
www.hrelp.org
The rise and rise of video
Increase in claims about video
Rise from about 25% to 75% of ELDP applicants
ELDP Panel has been demanding that some
applicants make video
Themes
 Goals and methodology of language documentation
 One size fits all
 The nature of the video medium
 Uninventing the massage
 Workflow and workload
 Disorder of magnitudes
 Community skills and needs
 On Hippocrisy
 Data portability and archiving
 Handling the bytes that feed
Goals and methodology of language
documentation
One size fits all
Himmelman:
The core of a language documentation, then, is
constituted by a comprehensive and representative
sample of communicative events as natural as
possible. Given the holistic view of linguistic behavior,
the ideal recording device is video recording.
Goals and methodology of language
documentation
 Cultural and cognitive aspects can be documented or
augmented by video (examples from Harrison)
 counting methods/systems
 locative expressions
 behaviours or appearances of plants animals etc that are
described as part of language-encoded knowledge:
 information about plant toxicity and preparation could usefully be video
 swimming formations (eg Marovo people of Solomon Islands who
have rich set of terms for fish behaviour and its relationships to the
calendar and hunting)
 Gila Pima (Arizona) name a plum tree "dog's testicles", and an edible
banana "looks like an erection" (umm, what will the videos show?)
However, David Crystal estimates that such culturally/environmentally
specific aspects are only about 10% of any languages’ content
Goals and methodology of language
documentation
 Discourse and genre
 distinguishing participants (McConvell)
 transparently capturing “stories” (Wittenburg)
 Adding or enhancing methodology
 stimulus materials
 the camera adds theatricality (Jukes)
 the camera as a participant (Atkins)
 enhance transcription through motivating community participation
 Sign language work
 treat video as inscription
 cameras, lighting, orientation, clothing etc
 Appreciated by communities
Goals and methodology of language
documentation
 Documentation can’t aim to capture everything (Austin)
 And the video camera cannot (cf next section)
 Argument for accountability has caused confusion
between events and recordings. Result: fantasy that video
is what happened and provides empirical evidence for all
kinds of claims
 Argument:
 video can do X => we should do video
 fails without goals and methodology for X
 Many pro-video arguments could be equally applied to
capturing other phenomena in other media:
 e.g. palatography
 collecting other text-based metadata eg on social setting
Goals and methodology of language
documentation
There must be different methodologies (linguistic
AND video) for different purposes (cf. sign)
Himmelmann:
[each potential discipline’s usages] influence the
recording and presentation of the data inasmuch as
certain kinds of information are indispensable for a
given analytical procedure (no phonetic analysis is
possible without some high-quality sound recording, no
analysis of gestures is possible without videotaping,
etc.)
Goals and methodology of language
documentation
So if there are distinct methodologies for different
purposes (e.g. sign)
how adequate could a generic video be?
how can video serve purposes that documenters don’t
have?
Goals and methodology of language
documentation
Explicit claimed purposes for video:
In ELDP applications, many applicants request funds
for video equipment but have no video-related
documentation goals
vs
Video exponents describe the potential of video but few
documenters actually have these goals
Goals and methodology of language
documentation
Many phenomena can't be represented (cf
Harrison):
complex family structures and their terminologies
changes in moon shape and phase (better as still
photos or diagrams); other calendric and geographic
expressions
time and distance eg Tofa (Siberia) have words for the
distance you can cover in a day on reindeer back
morphological, grammatical and most lexical
information
(also relationships, staging, motivations, histories...)
Goals and methodology of language
documentation
Community-orientation
community oriented content
members will best know what/how to shoot
why should linguist shoot video at all?
Goals and methodology of language
documentation
 Video footage is not data
 video less “authentic” than audio - it frames with a hard edge rather
than “listens” to an environment
 video is more bounded, more intentional than audio
 selection (time/space), point of view etc
 video content is multifaceted
 Video data example - traffic
camera
 nature of data defined
 informs methodological
choices for capture of data
The nature of the video medium
Uninventing the massage
Video is compelling, holistic, humanistic
Video “tells a story”
much of what we want to capture is already a story
(Wittenburg)
There is a filmic language for telling stories derives from human perception and narrative,
plus 100 years of cinematic evolution
Filmmakers “pour scorn” on film-as-truth (Weaver)
The nature of the video medium
“Shoot to edit” - dictum of filmmakers
more than a recommendation for good filming, a
diagnostic for whole approach
implies a view to methodology and outputs
ethics inform editing, they do not exclude it
Limits:
maximal: storyboards (pre-planned action and shots)
minimal: one that generates data - the traffic camera
The nature of the video medium
 Filmer has to know the nature of the events (e.g. football
vs. opera)
 Video is not ideal for spontaneous events except:
 bounded situations with conventions, eg. dinner party
 for accidental capture of “treasures” (ie home movies)
 Naivety of considering editing as “interference”
 editing is natural to the way we see and to the film medium
 story or message is achieved through editing
 linguists’ other work (from transcription to grammars) can be
understood as intense, informed editing
 objections to editing could be diagnostic of lack of relevant
methodologies/goals/skills
 Training required. Filmic skills must be learnt
The nature of the video medium
Fieldworkers’ preferences in an age obsessed
with light weight and miniaturisation are opposed
to methods for making good video:
robust tripod
things that are inevitably analogue such as lenses,
lighting
Workflow and workload
Disorder of magnitudes
Skills, workload, intrusion, volumes - all increase
by orders of magnitude
skills - equipment, shooting, editing, production
equipment - choice, usage, maintenance
power supplies
capturing, conversion
annotation
editing, production
data volumes
Workflow and workload
Video processing workflow (Wootton):
“shoot and edit sympathetically … convert to a useful
format"
bringing the video into the system - ingesting
temporal preprocessing - dealing with timing
spatial preprocessing - dealing with sizing
color correction - grading and picture quality
noise removal - cleaning it up
audio preparation
encoding the content
postprocessing and delivery
Workflow and workload
Annotation:
could easily involve a time ratio of up to 100 (1 hour
of video may take100 hours to process)
in practice, most documenters do not annotate the
phenomena that they did (or didn’t) identify
fallacy that annotation etc can be done later
video amplifies the value of event-participant knowledge
Workflow and workload
 Data volumes, eg for a 4 GB DVD project:
 project files, originals, backups (for reversion), disk images
 5 minutes of MPEG-2 video at DVD-equivalent quality occupies ~
150 MB
 5 minutes at DV quality (which you might use for editing), occupies
~ 1 GB (this is not studio quality which would be 5-6 GB)
 assuming semi-professional editing software that makes "nondestructive editing … using an EDL or reference movie that retains
all the source components intact"
 total volume for the DVD production is ~ 100GB (which is largely
the single copy of the original DV quality assets that are necessary
for editing)
Community skills and needs
On Hippocrisy
Hippocratic approach: working ‘for the benefit of
the ill’
Video offers a good candidate for:
community involvement
skills transfer
creating directly usable materials, including for
revitalisation
Community skills and needs
ELAN isn’t a usable presentation
but it can be used as editor to generate VCDs etc
(Jukes)
We’d need to observe what kinds of video are
current and effective in the community (McGill)
Can video be put in community hands (unlike
other linguistic aspects) because it involves no
linguistic methodology?
Do we patronise a language community by not
applying worked-out methods?
Data portability and archiving
Handling the bytes that feed
(More pictures without captions / songs without titles etc)
there are standards, e.g. MPEG, ELAN (eaf)
professional knowledge and equipment needed
for processing, encoding, migration
Data portability and archiving
Archivism:
skewed proportion of discussion about technology
instead of methodology, technique and goals
technical parameters as proxy for quality and effective
outcomes
hides severe limitation on dissemination of “raw” video
But technical advice has also been selective!
Data portability and archiving
Shooting technique and preservation quality:
camera movement and poor picture quality can
overwhelm compression algorithms
so poor techniques (eg non-use of tripod,
unnecessary pan or zoom, non-awareness of scene
evolution) cause the same "loss of information" that
has been so villified in the case of compressed audio
Data portability and archiving
 Necessity for compression violates the whole rationale
for digital preservation:
 MPEG conversions introduce the same “generational loss” as
analogue copying. “Analogue ... generational loss is supposed to
be eliminated when you record the video digitally. But this is only
the case if no format conversion takes place during the digital
transfer. Changing the encoding from one type to another results
in generational losses even in the digital domain."
 format refreshment or editing for mobilisation will make reencoding inevitable
 Editing should be done from high resolution or
uncompressed versions
Data portability and archiving
Storage costs may have to be revisited:
if highly compressed MPEG2 no longer accepted
if distributed storage strategies such as suggested in
LAN 9 become commonplace, since costs vary
according to scale of storage units
then Wittenburg's calculations (LAN 10) will not apply
Other archive costs:
dissemination (genres, management of protocol) ???
ELAR holdings by data type
 This table analyses some
data types of interest for
a representative sample
(70%) of holdings
 Date type by volume and
number of files, sorted by
volume
Data type
Volume (MB)
Files
audio
360,411
6,312
video
208,995
895
image
28,592
2,221
msword
223
404
pdf
196
134
eaf
33
176
text
32
781
lex
9
29
trs
5
246
xls
1
19
imdi
1
26
ELAR holdings by data type
 This table analyses some
data types of interest for
a representative sample
(70%) of holdings
 Date type by number of
files and volume, sorted
by number of files
Data type
Files
Volume (MB)
audio
6,312
360,411
image
2,221
28,592
video
895
208,995
text
781
32
msword
404
223
trs
246
5
eaf
176
33
pdf
134
196
lex
29
9
imdi
26
1
xls
19
1
Conclusion
Video can:
add to the representational methods used by
linguistics
encourage us to look at diverse phenomena
challenge our methodologies
provide new and effective ways of disseminating
language and cultural events and knowledge
Conclusion
A comparison: video vs multimedia
why few exhortations to produce multimedia?
multimedia:
distinguishes medium from mode of knowledge
representation
richer and more explicit interleaving of various
types of knowledge
imposes its workload/costs in more appropriate
ways
Conclusion
Generic, amateur video fails to respect
participants by not recognising linguistic
specialisation, complexity or expertise to the
same degree as “real” linguistic work
Naive video achieves “authenticity” mainly by not
editing - and thereby not producing usable
products!
Conclusion
There is a lot of tradition in evaluating the
descriptive value of linguistic work, but little in
defining the documentation value of video
If video really does represent the claimed range
of linguistic phenomena, it is a key mode of
documentation: then documenters (and their
teachers) need to pay much closer attention to
goals and methodologies!
It is not clear that it is linguists who should be
making video
Download