Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS What is ELAN? EUDICO Linguistic Annotator Annotation tool developed by MPI: create, edit, view and search annotations for video and audio data links text annotations with audio and/or video data. one audio stream, up to four video streams annotations are on tiers, these can be independent or linked to other tiers. no limit to the number of tiers. tiers can be hidden or rearranged for ease of use ELAN files can be exported in a variety of formats (including to Shoebox/Toolbox for interlinearisation, then reimported) Demonstration... Tiers, types, and stereotypes... Imagine an annotated text with two speakers, with a transcription and free translation There are 4 tiers: [email protected],[email protected], [email protected], [email protected] There are 2 types of tier: tx (text), and ft (free translation) Each 'type' is further categorised according to its stereotype - the way tiers of this type combine with other tiers... Tiers Each speaker can have their own set of tiers, so overlapping speech is not a problem. Tiers can contain many kinds of annotations, some of the most obvious are: IPA transcription practical orthographic transcription free translations into languages of wider communication morphemes and gloss gesture annotation grammar notes any other information which seems relevant Linguistic types Every annotation tier must be assigned a linguistic type which tells Elan what type of information the tier contains. Stereotypes: None: The annotation on the tier is linked directly to the time axis (eg. intonation units/sentences - a transcription or a reference number). Time Subdivision: The annotation on the parent tier can be subdivided into smaller units, which, in turn, can be linked to time intervals (eg. words). There cannot be gaps between units. Symbolic subdivision: Similar to Time Subdivision, except that the smaller units cannot be linked to a time interval (eg. morphemes within words). Included In: like Time Subdivision but there can be gaps (eg. words, with silence between them). Symbolic association: one-to-one association with a parent tier, eg. transcription with ref field, gloss and morpheme, free translation with sentence. Tier dependencies: parents and children Document X Text/utterances (speaker A) Words Types: (none) (Time subdivision) Morphemes Parts of speech glosses Free translations Text/utterances (speaker B) Words Morphemes Parts of speech glosses Free translations (symbolic subdivision) (symbolic association) (symbolic association) (symbolic association) Is it worth it? Time-alignment is time-consuming! Tiers, types, and stereotypes only have to be set up once Output is time-aligned transcription in XML which can be used for many purposes Archival Import to Toolbox for interlinearisation Import to DVD-authoring software Different workflows are possible ELAN files can be imported/exported in a variety of formats, including Shoebox/Toolbox Toolbox → ELAN ELAN → Toolbox Transcriber → ELAN → Toolbox Back and forth? Working with Toolbox This is not entirely straightforward, but is not too difficult if you are already quite familiar with the workings of Toolbox and the structure of its files. If you know you want to export to Toolbox, it’s better to start from the beginning with a ref type and tier (stereotype: None) which will only contain time information now (ie. it will be empty), but later will contain a Toolbox ref number. The transcription tier will be a symbolic association depending from the ref tier The Toolbox export process puts the time and speaker information in separate fields. After working in Toolbox, ELAN can import the file, and the time and speaker information will be preserved. Any questions?