D3.4 - eVikings II

advertisement
eVikings II
Establishment of the Virtual Centre of Excellence for IST RTD in
Estonia
FP5 IST accompanying measures project IST-37592
Deliverable D3.4:
New dictionaries and linguistic software
packages available on Internet
Institute of the Estonian Language
Compiled by Urmas Sutrop and Ülle Viks
Tallinn
September 2004
Description
The goal of this package was to enable www-based access to different electronic dictionaries
and linguistic software. Institute of the Estonian Language (IEL) was responsible for this task.
The main activity of IEL is connected with the compilation of academic original dictionaries,
which all exist also electronically. As entering dictionaries into computer started already 25
years ago, several different mark-up systems are in use. Some dictionaries were accessible
already earlier in KeeleWeb [Language Web] (http://ee.www.ee/) and on the home page of
IEL (http://www.eki.ee/dict/), but the large electronic publication has been achieved in
Keelevara (see Deliverable D3.2).
Some software packages have been developed in IEL to help lexicographers and other
linguists:
 technical aids of a lexicographer (sort, structure modifications, comparisons, etc)
 tools for the compilation of special dictionaries (entering and editing programs)
 universal XML-based dictionary editor
 morphology modules (as freeware DLLs http://www.eki.ee/tarkvara/): syllabification,
part of speech and inflectional type recognition, stem generation, morphological
synthesis and analysis
 grammatical entry generator (to add grammatical data into a dictionary entry)
During this project several dictionaries have gone through the process of standardisation and
transformation into XML-based standard of source texts and have been integrated into the
Keelevara (see Lexical resources).
Some packages of linguistic software are accessible through Keelevara but the real integration
of them into working environment is not realised yet (speech synthesiser, modules of Estonian
morphology of IEL, Filosoft language software, translating browser). They are accessible via
homepage of Language technology project, partners: University of Tartu, Institute of the
Estonian Language, Institute of Cybernetics, Ltd Filosoft
(http://www.eki.ee/keeletehnoloogia/tarkvara.html).
The main target of the project has been achieved, but preparation of new dictionaries will be
continue. Final integration of software packages into Keelevara environment is under
develpoment.
List of linguistic resources
1. Lexical resources
 Estonian language planning dictionary (ÕS 1999)
Eesti keele sõnaraamat ÕS 1999. Tallinn, 1999 (50 000 entries)
http://www.keelevara.ee/teosed/qs1999/
 Defining dictionary of contemporary Estonian (Seletussõnaraamat)
Eesti kirjakeele seletussõnaraamat I–VI. Tallinn, 1988– (100 000 entries) (24
volumes available)
http://www.keelevara.ee/teosed/seletav/
 Orthological dictionary for basic school (Õpilase ÕS)
Õpilase ÕS. Tallinn, 2004 (22 000 entries)
http://www.keelevara.ee/teosed/qpilase_qs/
 English-Estonian MT dictionary (Inglise-eesti masint.)
http://www.keelevara.ee/teosed/en-et_masin/ (80 000 entries)
 Estonian-English MT dictionary (Eesti-inglise masint.)
http://www.keelevara.ee/teosed/et-en_masin/ (80 000 entries)
 Russian-Estonian dictionary (Vene-eesti)
Vene-eesti sõnaraamat I–IV. Tallinn, 1984–1994 (75 000 entries)
http://www.keelevara.ee/teosed/ru-et_eki/
 Estonian-Russian dictionary (Eesti-vene)
Eesti-vene sõnaraamat I–V. Tallinn, 1997– (60 000 entries) (3 volumes available)
http://www.keelevara.ee/teosed/et-ru_eki/
 Norwegian-Estonian dictionary (Norra-eesti)
T. Farbregd, S. Kangur, Ü. Viks. Norra-eesti eesti-norra sõnaraamat. Tallinn
1998 (21 000 Norwegian entries)
http://www.keelevara.ee/teosed/no-et_eksa/
 Estonian-Norwegian dictionary (Eesti-norra)
T. Farbregd, S. Kangur, Ü. Viks. Norra-eesti eesti-norra sõnaraamat. Tallinn
1998 (19 000 Estonian entries)
http://www.keelevara.ee/teosed/et-no_eksa/
 Place names of the world (Maailma kohanimed)
P. Päll, Maailma kohanimed. Tallinn, 1999 (4200 entries)
http://www.keelevara.ee/teosed/maailma_kohanimed/
 Concise dialect dictionary (Väike murdesõnastik)
Väike murdesõnastik I-II. Tallinn 1982–1989 (80 000 entries)
http://www.keelevara.ee/teosed/vaike_murdesqnastik/
 Poetic synonyms from J. Peegel (Poeetilised sünonüümid) (412 pp)
J. Peegel, Nimisõna poeetilised sünonüümid eesti regivärssides. Tallinn 2004.
http://www.keelevara.ee/teosed/poeetilised/
 First dictionary of Estonian slang from M. Loog (Slängisõnaraamat)
M. Loog, Esimene eesti slängisõnaraamat. Tallinn, 1991 (7500 entries)
http://www.keelevara.ee/teosed/slang/
 Handbook of the Estonian langauage (Eesti keele käsiraamat)
M. Erelt, T. Erelt, K. Ross. Eesti keele käsiraamat. Tallinn, 1997
http://www.keelevara.ee/teosed/ekkr/
 Person names of Estonia from 1900-2004 (Mehenimed, Naisenimed,
Perekonnanimed)
http://www.keelevara.ee/teosed/m_nimed/, http://www.keelevara.ee/teosed/n_nimed/,
http://www.keelevara.ee/teosed/p_nimed/
Some additional dictionaries have been preprocessed and are ready to be integrated into the
portal (in-house versions are available in http://www.eki.ee/dict/ ):







Estonian orthological dictionary (Õigekeelsussõnaraamat 1976)
Õigekeelsussõnaraamat. Tallinn, 1976 (114 000 entries)
Dictionary of Estonian idioms (A. Õim, Fraseoloogiasõnaraamat)
Õim, Fraseoloogiasõnaraamat. Tallinn, 1993 (6500 entries)
Dictionary of synonyms (A. Õim, Sünonüümisõnastik)
Õim, Sünonüümisõnastik. Tallinn, 1991 (10 000 entries)
Dictionary of antonyms (A. Õim, Antonüümisõnastik)
Õim, Antonüümisõnastik. Tallinn, 1995 (2000 entries)
Word-index of the Saareste's thesaurus (Saareste indeks)
Andrus saareste eesti keele mõistelise sõnaraamatu indeks. Uppsala, 1979
Etymological reference book (A. Raun, Etümoloogiline teatmik)
Raun, Etümoloogiline teatmik. Bloomington, 1982
Finnish-Estonian dictionary I, II (Soome-eesti suursõnaraamat)
Soome-eesti suursõnaraamat I–II. Tallinn, 2003 (90 000 entries)
2. Language software resources
(in home page of Language technology project:
http://www.eki.ee/keeletehnoloogia/tarkvara.html)
 Nimisõnafraaside filtreerija
Software for filtering of noun phrases from running text (UT)
 Inglise-eesti sõnastik
Implementation software of English-Estonian dictionary (IEL)
 ESTMORF, eesti keele morfoloogiline süntees ja analüüs
Morphological synthesis and analysis of Estonian (Filosoft)
 EKI morfoloogiamoodulid
Modules of Estonian morphology (IEL)
 TAHMM, ESTMORFi tulemuste ühestaja
Morphological disambiguator (Filosoft)
 Eesti keele kõnesüntees ja ekraanilugeja
Speech synthesizer of Estonian and screen reader (IOC, IEL, Filosoft)
 ELA, teksti lausestaja
Sentence parser (from text) (UT)
Links: http://www.keelevara.ee/, http://www.eki.ee/dict/ ,
http://www.eki.ee/keeletehnoloogia/tarkvara.html
Developed by Indrek Hein, Margit Langemets, Ülle Viks
Download