Lexicon Standards and the LEGO Project

advertisement
LIFTing LEGO with RELISH:
Lexicon Interchange FormaT in Use
Helen Aristar-Dry
Institute for Language Information and Technology
Eastern Michigan U
Outline
 Background: The RELISH project
 The LEGO project
 Project interface
 Project workflow
 LIFT: Lexicon Interchange Format
 What it is
 Use in the LEGO project (“LL-LIFT”)
 Sample entry: comparison with LMFcompliant XML output by LEXUS
The RELISH Project
 RELISH: Rendering Endangered Lexicons
Interoperable through Standards Harmonization
 Joint project: U of Frankfurt, MPI-Nijmegen,
LINGUIST List
 Goal: markup harmonization at two levels
 Semantic
 Structural
 Test cases: 6 lexicons fr. LEGO & Lexus
RELISH: MPI, ILIT and Lexicon Standards
LEGO & RELISH
LEGO Test
Lexicons
LEXUS Test
Lexicons
RELISH
- LL-LIFT
- GOLD
Interchange format:
TEI? LIFT?
- LMF-compliant
XML (various)
- ISOCats
The LEGO Project
 3-year project sponsored by the NSF
 Participants: ILIT (Linguist List) & U at
Buffalo
 Goal: a “datanet” of interoperable lexicons.
Interoperability based on:
 grammatical information mapped to GOLD
 structure mapped to a common schema (LL-LIFT)
 output in RDF
The LEGO Project
 Initial set of lexical resources:
 17 EL lexicons prepared by LINGUIST List:
 Shoshone, Archi, Kayardild, Fulfulde, Mocovi,
Biao Mien, Potawatomi, Saliba, W. Pantar, W.
Sissala, Wichi, …
 3000+ wordlists prepared by U at Buffalo:
Usher-Whitehouse lists, Loanword Typology Lists,
Intercontinental Dictionary Series lists.
 Extensible:
 In practice: to Lexicons in LIFT
 Possibly: RDF, TEI (or another official
serialization of LMF)
The LEGO Project
 Purposes
 Not intended to develop a lexicon creation
or lexicon display tool
 Intended to
 support multi-lexicon search and
comparison
 demonstrate the value of digital
standards in linguistic research
LEGO Workflow
Lexicon - MS Word LLExcel
Access
Toolbox
Filemaker
Descriptive XML
LL-LIFT (XML)
LEGO db
mapping to GOLD
LEGO
Interoperable
Lexicon
LEGO: lego.linguistlist.org
LEGO Browse Lexicons Page
LEGO: Lexicon ‘homepage’
LEGO: Browse Lexicon Entries
Demonstration of Interoperability: LEGO Search
LEGO: Search multiple lexicons by LIFT field (Definition, Example, Variant, etc.
LEGO: Search multiple lexicons by grammatical information (GOLD concept).
Extending LEGO: Upload Lexicon or Wordlist
Map to GOLD: Choose Lexicon to Edit
Mapping 1: View lexicon’s labels for grammatical concepts
Mapping 2: Click to view GOLD concepts, Click ‘Add’ to Map
Mapping 4: Lexicon label mapped to GOLD concept
Mapping 3: Access GOLD interface (if needed)
LIFT
 LIFT = Lexicon Interchange Format
 XML format for storing and exchange of
lexical information
 Developed by SIL International
 Designed to be easy to convert into and
out of MDF and Fieldworks formats
 Current version: http://code.google.com/p/liftstandard/downloads/detail?name=lift_13.pdf
Programs that support LIFT
 WeSay uses LIFT as its primary format.
 FieldWorks Language Explorer
(FLEx) can import and export LIFT files.
 Lexique Pro can open LIFT documents
for viewing, printing, and making web
pages. It can also save to LIFT format.
(fr. http://code.google.com/p/lift-standard/)
Utilities for LIFT
 Solid can convert basic SFM (standard
format markers, e.g. Toolbox format) to
LIFT (see: http://lingtransoft.info/apps/solid
 LiftTweaker Can selectively modify a LIFT
file for targeting different audiences
Lexicons in LIFT
 LIFT chosen as upload format for LEGO
because of the large number of lexicons
potentially available in LIFT
 About 50 published lexicons in Lexique Pro
 180+ lexicons in Fieldworks Language
Explorer (FLEx) ?
 300+ lexicons in Shoebox/Toolbox ?
 With the owner’s permission, these could
easily be integrated into the LEGO
system
LIFT UML Diagram for Entry
Notes
 Grammatical Info is attached to Sense,
not to Entry or Form (differs from LMF)
 Variant is attached to Entry, not ‘sense’ –
can’t add a ‘sense’ to a variant
 Multiple senses and variants allowed
 Highly customizable: Field, Type, and
Range can be added to virtually any
element (can be defined in the document
header)
LL-LIFT
 Lack of constraint on use of Field
and Trait constituted a problem for
LEGO
 Developed ‘LL-LIFT’
 a constrained form of LIFT
 which still validates against the
LIFT schema
LL-LIFT
 Major constraints
 No Header
 Grammatical Information confined to a
single element
 Delimited within db field
 Parsed out during GOLD mapping
 Minor entries, comparison forms, etc
 separate entries
 unified via ‘relation’ element
lexical-unit
dialects
note
sense
grammatical-info
paradigmatic traits
definition
example
note
variant
relation
<entry id="d1e56244">
<trait name="original-id" value="2c9090a22632946601267b22f98e5098"/>
<lexical-unit>
<form lang="syv">
<text>дадагалзаар</text>
</form>
</lexical-unit>
<variant>
<form lang="syv">
<text>dadagalzaar</text>
</form>
</variant>
<sense>
<grammatical-info value="n."/>
<definition>
<form lang="eng">
<text>doubt</text>
</form>
</definition>
</sense>
</entry>
Tuva entry in LIFT
<LexicalEntry id="2c9090a22632946601267b22f98e5098">
<DataCategory type="rank">5155</DataCategory>
<DataCategory type="lexeme">дадагалзаар</DataCategory>
<DataCategory type="part of speech">n.</DataCategory>
<Form id="2c9090a22632946601267b22f9fd50a0" type="form">
<DataCategory type="image"/>
<DataCategory type="Audio">tvn_5155.mp3</DataCategory>
</Form>
<Sense id="2c9090a22632946601267b22f9fd50a6" type="sense">
<DataCategory type="gloss">doubt</DataCategory>
<DataCategory type="transcription">dadagalzaar</DataCategory>
</Sense>
<ListOfComponents/>
</LexicalEntry>
Tuva entry exported from LEXUS (LMF-compliant)
<LexicalEntry id="2…">
<DatCat type="rank">5155</DatCat>
<DatCat type="lexeme">
дадагалзаар</DatCat>
<DatCat type="part of speech">n.
</ DatCat >
<Form id="2…" type="form">
< DatCat type="image"/>
< DatCat type="Audio">
tvn_5155.mp3</DatCat>
</Form>
<Sense id="2…" type="sense">
< DatCat type="gloss">doubt</DatCat >
< DatCat type="transcription">
dadagalzaar</DatCat>
</Sense>
<ListOfComponents/>
</LexicalEntry>
LMF
<entry id="d1e56244">
<trait name="original-id” value=“2…"/>
<lexical-unit>
<form lang="syv">
<text>дадагалзаар</text>
</form>
</lexical-unit>
<variant>
<form lang="syv">
<text>dadagalzaar</text>
</form>
</variant>
<sense>
<grammatical-info
value="n."/>
<definition>
<form lang="eng">
<text>doubt</text>
</form>
</definition>
</sense>
</entry>
LL-LIFT
Summary
 LL-LIFT = current XML schema used in
LEGO
 Easy to transform (at least one) LMFcompliant XML output of LEXUS to LLLIFT
 Transformation could be extended to
TEI serialization of LMF. May require
constraint
Download