Relations

advertisement
Adriana Roventini*– Rita Marinelli*
Extending the Italian WordNet with the
Specialized Language of the Maritime Domain
*Istituto di Linguistica Computazionale del CNR Pisa Italy
e-mail: rita.marinelli@ilc.cnr.it - adriana.roventini@ilc.cnr.it
Our purpose
to describe:
the construction we are carrying out at the Institute
for Computational Linguistics, of a terminological
subset belonging to the maritime lexical domain (in
particular to the technical and commercial/maritime
transport domain).
Wordnet
• In the Princeton semantic
WordNet (Miller et al., 1990)
the meanings of words are
represented in terms of their
conceptual-semantic
and
lexical relations to other
words;
• it has been the tool of choice
for building Natural Language
Processing (NLP) systems of
various kinds.
EWN
The main goals of the EuroWordNet (EWN) are:
• to develop a (multilingual) lexical resource,
retaining the basic underlying design of
WordNet 1.5 (hereafter WN1.5)
• to improve it in order to meet the needs of
research in the field of NLP (Vossen, 1999).
Background
SI-TAL: an Italian national Project (Integrated
System for the Automatic Treatment of
Language)
development of various integrated language resources and
software tools for the automatic treatment of Italian written and
spoken language
lexical semantic resource
developed within the SI-TAL project, enlarging
the first database built in EWN.
ITALWORDNET:
IWN
•EWN project
IWN
SI-TAL
Integrated System for the Automatic
Treatment of Language
•IWN database containing ca. 50.000 synsets:
Nouns
Verbs
Adjectives
Adverbs
Proper Names
Not encoded in
EWN
•IWN links synsets by lexical-semantic relations:
Synonymy
the most important relations
Hyponymy
Many other semantic relations encoded for various subsets of Italian Nouns (Common &
Proper ), Verbs, Adjectives
•IWN synsets linked toWordNet 1.5 through a generic ILI (InterLingual
Index)
The IWN linguistic model
• Synsets and synonymy relation
• Synset as basic notion
around which WN, EWN and IWN are built:
synset or set of synonymous words belonging to
the same Part-of-Speech (PoS) that can be
interchanged at least in a context.
Synsets are connected by semantic relations to
other synsets and to the ILI (an unstructured
version of WN 1.5, containing all its synsets but
not the relations among them).
Inherited from EWN also:
• language-internal relations link the language-specific
synsets (mainly hyperonymy/hyponymy or is-A relation,
role, causes, purpose, part relations, etc.)
• equivalence relations link the Italian synsets to the
InterLingual-Index (ILI).
By linking our wordnet to the ILI we ensured the possibility to use IWN
for multilingual applications.
Reasons for our choice
• The globalisation of trade, business and
travel and the technology development
(growing importance of transport).
• The changes produced within the
maritime activity and the related
terminology (remarkable incidence of this
lexical domain)
• New techniques of communication,
translation and diffusion of terms
(‘monopole’ of the English language).
Building/structuring the
terminological IWN
• according to the design principles of the
generic wordnet, (applying the same
semantic relations model)
• exploiting the possibility - available in IWN
through the Inter-Lingual Index (ILI) - of
linking the specialized terms to the
corresponding closest concepts in English.
Sources
Several information sources have been used to select the
BC:
• the
Dizionario
Globale
dei
termini
marinareschi, edited by the “Capitaneria del
Porto di Livorno”, online on the Web;
• the Dizionario di marina, edited by Barberi
Squarotti G. , Gallinaro I, (2002);
• the Glossario dello spedizioniere (Annuario
Federspedi 1988);
• the Dizionario di termini marittimi mercatili,
compiled by P. R. Brodie and translated by E.
Vincenzini, Lloyd’s of London Press, Legal
Publishing and Conferences Division, 1988.
Choice of the base concepts
(BCs)
• design of the terminological database top
level, identifying the most relevant and
representative domain concepts or basic
concepts (BCs) .
(i.e. showing a large number of hyponyms,
and/or more frequently used in this particular
domain of maritime navigation and transport).
First Base-Concepts
• A first nucleus of over 200 BCs was identified,
such as nave (ship), porto (harbour), ormeggio
(mooring), albero (mast), carico (cargo),
spedizione (shipment), navigazione (navigation),
trasporto (transport), tariffa (tariff), nolo (freight)
and so on, which are sufficiently general and
constitute the root nodes of the specialized
database.
BCs “export/import”
as XML files:
(see the example below concerning the verb imbarcare/to ship).
IWN
xml
IWNTerm
• Example of an XML export file
•
“imbarcare” (to ship)
- <WORD_MEANING ID="V#32560" PART_OF_SPEECH="V">
•
<GLOSS />
• - <VARIANTS>
•
<LITERAL LEMMA="imbarcare" SENSE="1" STATUS="CT" />
•
</VARIANTS>
• - <INTERNAL_LINKS>
• - <RELATION TYPE="xpos_near_synonym" ID="2" INV_ID="2">
•
<TARGET_WM ID="27869" PART_OF_SPEECH="N" LEMMA="imbarco" SENSE="1"
GLOSS="" />
•
</RELATION>
• - <RELATION TYPE="has_hyperonym" ID="8" INV_ID="8">
•
<TARGET_WM ID="32127" PART_OF_SPEECH="V" LEMMA="fare" SENSE="14"
GLOSS="causare un cambiamento in un processo o uno stato (seguito da un
infinito)." />
•
</RELATION>
• - <RELATION TYPE="has_hyponym" ID="10" INV_ID="10">
•
<TARGET_WM ID="36489" PART_OF_SPEECH="V" LEMMA="reimbarcare" SENSE="1"
GLOSS="" />
•
</RELATION>
• - <RELATION TYPE="involved_instrument" ID="31" INV_ID="31">
•
<TARGET_WM ID="15111" PART_OF_SPEECH="N" LEMMA="imbarcatoio" SENSE="1"
GLOSS="" />
•
</RELATION>
•
</INTERNAL_LINKS>
• - <EQ_LINKS>
• - <RELATION TYPE="eq_synonym" ID="1" INV_ID="1">
•
<TARGET_WM ID="r#1128479" />
•
</RELATION>
•
</EQ_LINKS>
•
</WORD_MEANING>
•
</WN>
•
New BCs
• Other BCs were included “ex novo”, not present
with their maritime senses in the generic database,
but very frequently used and representative of this
specific domain, for instance: nolo (freight), classe
(class), fanale (light), punto (position), destino
(destination), agente marittimo (shipping agent),
spedizioniere (freight forwarder).
Example “Punto (Position)”
Use of Relations to codify
specialized terms
first nucleus of terms increased
(encoding hyponyms and using other semantic
relations)
Example “Ormeggio (Mooring)”
Kind and Number of Terms
• 2227 lemmas corresponding to 1721 synsets and
2355 word-senses belonging to the maritime
(technical/nautical and maritime transports) domain
all linked to the generic wordnet.
• Terms belonging to all the different grammatical
categories of nouns, verbs, adjectives, adverbs
and a small set of proper names have been codified
in the terminological data base (3971 relations).
Example
“Porto
(Harbour)”
Polilexical Units
Base Concepts (BCs) as the root of a terminological sub-hierarchy:
(in many cases) hyponyms = BC + adjective or prepositional phrase
For instance:
carico (cargo),
carico completo (full cargo), carico di merci varie (general cargo), carico in
coperta (deck cargo), carico parziale (part load cargo),
tariffa (tariff),
tariffa doganale (custom tariff), tariffa di trasporto (transport tariff), tariffa
forfettaria (flat-rate tariff),
nolo (freight)
nolo anticipato (freight prepaid), nolo intero (full freight), nolo secondo il
valore (ad valorem freight), nolo a destino (freight payable at destination).
Linking Terms to the ILI
• Actually the English term or multiword (or its acronym)
is often known and used much more than the Italian one in
the maritime transport activity.
• Difficulty in finding the synonyms
both the English term (or multiword) and the Italian
one are included in the synset as variants, (as we thought
this could be useful to non-professionals as well).
EXAMPLES:
• RO-RO (Roll On/Roll Off) usually indicates nave
traghetto per automezzi (ferry for vehicles
transport),
• the abbreviation FOB (Free On Board) is used to
say con le spese pagate fino a bordo, (loading
costs paid up to the ship’s broadside),
• CIF (Cost Insurance and Freight) to say costi fino
a bordo più assicurazione e nolo mare pagati
(loading costs, insurance and sea-freight prepaid).
The Link Structure
• the BCs identified for this terminological
lexicon constitute the top level and are the
root nodes for the plug-in operation which
allows linking between the generic and the
specialized wordnet.
Two types of plug_in relations
are codified
 the eq-plug-in relation, as equivalence synonymy
relation between synsets of the two databases
the has-hyperonym(hyponym)-plug relation, as
equivalence hyperonymy/hyponymy relation
between synsets of the two databases.
Tool Facilities:
• a simultaneous parallel consultation of the two
databases to facilitate insertion of the relations
• an integrated research between the two databases
if the lemma is found in both databases and there is an eqplug-in relation between the synsets, the synset belonging
to the specific domain eclipses the generic one exploiting
the integrated research.
Tool Facilities:
downward and horizontal relations (part-of relations,
role relations, cause relations, derivation, etc.) are taken
from the terminological wordnet.
upward (hyperonymy) relations are taken from the
generic one.
It is possible to access the generic database or the terminological
database or both databases at the same time.
EXAMPLE “Nolo (Freight)”
“Nolo” plug-in (with downward relations)
“Nolo” plug-in (with upward relations)
EXAMPLE “Bussola (Compass)”
“Bussola” plug_in (with downward relations)
“Bussola” plug_in (with upward relations)
Differences between IWN and
Dictionaries/Glossaries
The data are not only described (by the
definition), but also codified (by relations)
data structured only alfabetically in the dictionary edited
by the Harbour Master (we can read for example all
information about ‘bussola’ all together and almost
confused) become, in a relational database, synsets, linked
to each other by many types of semantic relations
(hyperonymy, hyponymy, holo/mero part, etc.) which can
also be managed automatically.
FINAL REMARKS
• maritime terminology is object of great
interest in a maritime nation like Italy,
which has a strong marine tradition
• the English terms prevail over the Italian
synonyms
• maritime terminology dictionaries are rare
and sometimes it is very difficult to find an
English translation of these terms
Instrument for work…
The possibility of having definitions and
translations of specific terms is a useful
instrument for work (export-import
companies, maritime agencies, etc.), at
school and the didactic activities of
various
types
(nautical
Institutes,
professional training, etc.) and, in general,
whenever a reference to terms of this
specific domain is needed.
• From a ‘commercial’ point of view, the English
language prevails over all other languages:
contracts, negotiations, chartering and operation
documents of cargo ships (like bills of lading, etc.)
are in English, and so are a great number of
reference books.
• from the point of view of ‘usefulness’, there are
circumstances in which it is necessary to refer to a
translation of technical terms that is correct,
abreast and absolutely unambiguous.
Our aim
• to build a terminological database showing the
semantic relations between different concepts, a precise
correct linkage to the English terms, and then to make
it a point of reference, in circumstances like legal
actions, for instance, when the judge…..
• to carry on this research increasing the number of
terms and starting a cooperation with the official
transport organizations in order to enrich and refine
this product and to arrive at a definitive version
recognized and validated.
• to start this kind of research for the Italian language.
Results
• Specialized lexicon enlarged
• Italian terms clarified
• More effective management of
Italian terms and English terms
In spite of globalisation, in a
maritime country like ours it
is absolutely essential not to
lose our linguistic identity
Download