elan-intro_chambers

advertisement
Introduction to ELAN
Mary Chambers
ELAP, Department of Linguistics, SOAS
What is ELAN?
 EUDICO Linguistic Annotator
 Annotation tool developed by MPI: create, edit, view and search
annotations for video and audio data
 links text annotations with audio and/or video data.
 one audio stream, up to four video streams
 annotations are on tiers, these can be independent or linked to
other tiers.
 no limit to the number of tiers.
 tiers can be hidden or rearranged for ease of use
 ELAN files can be exported in a variety of formats (including to
Shoebox/Toolbox for interlinearisation, then reimported)
Demonstration...
Tiers, types, and stereotypes...
Imagine an annotated text with two speakers,
with a transcription and free translation
There are 4 tiers: tx@speakerA,tx@speakerB,
ft@speakerA, ft@speakerB
There are 2 types of tier: tx (text), and ft (free
translation)
Each 'type' is further categorised according to its
stereotype - the way tiers of this type combine with
other tiers...
Tiers
 Each speaker can have their own set of tiers, so
overlapping speech is not a problem.
 Tiers can contain many kinds of annotations, some of
the most obvious are:
 IPA transcription
 practical orthographic transcription
 free translations into languages of wider communication
 morphemes and gloss
 gesture annotation
 grammar notes
 any other information which seems relevant
Linguistic types
 Every annotation tier must be assigned a linguistic type which tells
Elan what type of information the tier contains.
 Stereotypes:
 None: The annotation on the tier is linked directly to the time axis (eg.
intonation units/sentences - a transcription or a reference number).
 Time Subdivision: The annotation on the parent tier can be subdivided into smaller units, which, in turn, can be linked to time
intervals (eg. words). There cannot be gaps between units.
 Symbolic subdivision: Similar to Time Subdivision, except that the
smaller units cannot be linked to a time interval (eg. morphemes
within words).
 Included In: like Time Subdivision but there can be gaps (eg. words,
with silence between them).
 Symbolic association: one-to-one association with a parent tier, eg.
transcription with ref field, gloss and morpheme, free translation with
sentence.
Tier dependencies: parents and
children
Document X
Text/utterances (speaker A)
Words
Types:
(none)
(Time subdivision)
Morphemes
Parts of speech
glosses
Free translations
Text/utterances (speaker B)
Words
Morphemes
Parts of speech
glosses
Free translations
(symbolic subdivision)
(symbolic association)
(symbolic association)
(symbolic association)
Is it worth it?
Time-alignment is time-consuming!
Tiers, types, and stereotypes only have to be set
up once
Output is time-aligned transcription in XML
which can be used for many purposes
Archival
Import to Toolbox for interlinearisation
Import to DVD-authoring software
Different workflows are possible
ELAN files can be imported/exported in a variety
of formats, including Shoebox/Toolbox
Toolbox → ELAN
ELAN → Toolbox
Transcriber → ELAN → Toolbox
Back and forth?
Working with Toolbox
 This is not entirely straightforward, but is not too difficult if you are
already quite familiar with the workings of Toolbox and the structure
of its files.
 If you know you want to export to Toolbox, it’s better to start from
the beginning with a ref type and tier (stereotype: None) which will
only contain time information now (ie. it will be empty), but later will
contain a Toolbox ref number. The transcription tier will be a
symbolic association depending from the ref tier
 The Toolbox export process puts the time and speaker information
in separate fields. After working in Toolbox, ELAN can import the
file, and the time and speaker information will be preserved.
Any questions?
Download