From: AAAI Technical Report SS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
Using Distributed
Patterns as Language
Independent Lexical Representations
1Richard F.E. Sutcliffe
Annette McElligott
Gerry () N~ill
Department of Computer Science
and Information
Systems
University of Limerick
Limerick, Ireland
1
Introduction
While it is possible to construct Machine Translation (MT) systems of surprising sophistication using the technique of transfer between augmented parse trees (Brockmann, 1991) few
people would doubt that in order to perform a fully satisfactory translation it will ultimately
be necessary to work with meaning representations.
However, there are many problems with
developing a computationally tractable representation scheme for linguistic meanings either at
the sentential (propositional) or lexical levels. One approach to the problem of capturing meanings at the lexical level is to use a form of distributed representation where each word meaning
is converted into a point in an n-dimensional space (Sutcliffe, 1992a). Such representations
can capture a wide variety of word meanings within the same formalism. In addition they can
be used within distributed representations for capturing higher level information such as that
expressed by sentences (SutcliiTe, 1991a). Moreover, they can be scaled to suit a particular
tradeoff of specificity and memoryusage (Sutcliffe, 1991b). Finally, distributed representations
can be processed conveniently by vector processing methods or connectionist algorithms and
can be used either as part of a symbolic system (Sutcliffe, 1992b) or within a connectionist
architecture (Sutcliffe, 1988).
A further point of interest regarding distributed representations for nouns is that they can
be extracted automatically from machine readable dictionaries.
As is well known, dictionaries contain much taxonomic information (Amsler, 1984). This information can be extracted
automatically by parsing the dictionary definitions and extracting taxonomic pointers from
them in a recursive fashion. For example both Vossen (1990) and Guthrie et al. (1990)
constructed exhaustive taxonomies for the Longmans Dictionary of Contemporary English in
this way. Wehave shown one manner in which it is possible to traverse such a taxonomy determining both the features germane to a particular concept and their appropriate centralities
(strengths). In the present work we have taken this idea further by showing how lexicons
different languages can be mappedinto the same feature space thus allowing lexical translation
between those languages.
1Weare mostgrateful to TomBrehony,ColmanCollins, Denis Hickey, Ken Litkowski, TonyMolloy, RedmondO’BrienandPiek Vossenfor their help with this research. Theauthors’emails are sutcliffer~ul.ie,
mcelligotta~ul.ieandoneillg~ul.ie
55
cealg (treachery)
iu’r (yew)
sleight 0.495074
subterfuge0.386588
shenanigan0.281638
circlet 0.257248
folder 0.257248
ring 0.257248
round 0.257248
bacillus 0.191211
atoll 0.181902
beanie 0.181902
arborvitae0.421825
larch 0.421825
myrtle 0.421825
box 0.36855
carob 0.36855
bock 0.344418
cormorant0.344418
tone 0.344418
periwinkle0.344418
winkle 0.344418
Example System Outputs
2
The Extraction
Algorithm
Our original extraction algorithm was very simple (Sutcliffe 1992c). Suppose the dictionary
contains the definitions
"An Afghan is a large dog", "A dog is a furry animal", and "An
animal is a sentient entity". A representation for Afghan can be constructed by asserting
that it is [-{-large +furry -{-sentient]. Weopt to weight these attributes according to their
distance from the concept being defined, giving: [<large, 1.0> <furry, 0.9> <sentient,
0.8>]. Such a representation can be considered as a vector in an n-dimensional space, where
n is the total number of distinct features extracted from the dictionary by repeating the above
procedure for all noun entries. Once such representations
have been normalised (scaled to
length one) we can determine how similar in meaning a pair of concepts are by computing, say,
the dot product of their representations.
3
The Translation
Method
In an initial experiment we have investigated whether it is possible to use the above extraction
algorithm as a basis for lexical translation. The method was as follows. First we constructed a
set of representations for 4263 nouns in the Merriam Webster Compact Electronic Dictionary
in terms of a set of 2013 English features. Next we constructed a set of representations for 2928
Irish words in terms of 505 Irish features extracted from the Focl6ir Beag dictionary. Thirdly
we devised by hand a translation which maps each Irish feature onto one or more of the 2013
English features. Finally we converted each Irish word representation from Irish features to
English features. The reason for allowing one source language feature to map to several target
language features is that in many cases there was no exact translation between membersof the
two feature sets. In such cases we scaled the centralities of the selected target features so that
their length taken as a whole was the same as the ’length’ of the original source feature. Thus
in translating [geal,0.601] using the English features ’bright’, ’bright-colored’, ’brilliant’, ’light’
and ’light-colored’ we added the following set to the target representation: [[bright, 0.269],
[bright-colored, 269], [brilliant,0.269],
[light, 0.269], [light-colored, 0.269]]. The purpose here
was to ensure that the overall contribution to meaning of the target English feature list was
the same as that of the original source Irish feature.
The result of these stages is that we have a set of English words and a set of Irish words
56
each of whose meanings is defined as some point in a commoninterlingual space defined by the
English feature set. Nowit is possible to translate from, say, Irish to English in the following
manner. Weselect the Irish source word and retrieve its representation. This is then compared
with the representation of each English word in the lexicon. Weselect as the English translation
the word whose representation best matches that of the original Irish word.
4
Results
and Discussion
The figure shows two example outputs of the system, translating from Irish into English. As
can be seen, the system is selecting sensible translations for the source words and the results
are in general good enough to encourage further work. However, there are still many problems
with the system. There are two main reasons for this. Firstly, the parsers we are using can
only recognise noun phrases and thus reject a large proportion of the noun definitions. As a
result our lexicon for each language is only based on a part of the complete taxonomy defined
by the parent dictionary. Secondly we have not as yet performed any post-processing on the
distributed representations constructed from the taxonomies. For example many features are
synonymsof each other, meaning that if one concept is, say, ’big’ but not ’large’ while another
is ’large’ but not ’big’ the similarity between the two is not captured by the representations. We
are working on algorithms which collapse such synonyms thus increasing the average frequency
with which a feature occurs and thus improving the performance of the system.
As we have noted earlier, the translation between features in the source language and those
of the target language is at present accomplished by hand. However a bilingual dictionary
could also be used for this purpose. It should be noted that such a dictionary can be much
more modest in scope than either of the monolingual dictionaries participating in any language
mapping, and that errors in translation caused by, say, problems in resolving source or target
word senses are muchless injurious to the integrity of the translation than would be the case
if a bilingual dictionary were being used for the translation per se.
In summary, we have presented a method for mapping between noun senses in pairs of
monolingual dictionaries.
The method involves constructing distributed representations for
noun meanings in terms of the appropriate host language features, and then translating features
in one language into the other. We have demonstrated the method using words from the CED
and Focl6ir Beag dictionaries.
5
References
Amsler, R.A. (1984). Machine-Readable Dictionaries.
and Technology (ARIST), 19, 161-209..
Annual Review of Information
Science
Brockmann, D. (1991). Transfer Problems. Internal
Memorandum ET IM 026, Computational Linguistics and Machine Translation Group, Department of Language and Linguistics,
University of Essex, UK.
Evens, M. (1989). Computer-Readable Dictionaries.
and Technology (ARIST), 24, 85-117.
Annual Review of Information
Science
Guthrie, L, Slator, B.M, Wilks, Y. and Bruce, R. (1990). Is there Content in Empty Heads?
Proceedings of the 13th International Conference on Computational Linguistics (COLING-90),
Helsinki, Finland, 3, 138-143.
Sutcliffe,
R.F.E. (1988). A Parallel
Distributed Processing Approach to the Representation
57
Knowledgefor Natural LanguageUnderstanding. Unpublished doctoral thesis, University of
Essex, UK.
Sutcliffe, R.F.E. (1991a). Distributed Representations in a Text Based Information Retrieval
System: A NewWayof Using the Vector Space Model. In Proceedings of the Fourteenth Annual
International A CM/SIGIRConference on Research and Developmentin Information Retrieval,
Chicago, II., October 13-16, 1991. NewYork, NY:ACM
Press, pp. 123-132. Also available as
Technical Report UL-CSIS-91-6, Department of ComputerScience and Information Systems,
University of Limerick, March1991.
Sutcliffe, R.F.E. (1991b). Distributed Subsymbolic Representations for Natural Language:
Howmany Features Do You Need? Proceedings of the 3rd Irish Conference on Artificial
Intelligence and Cognitive Science, 20-21 September1990, University of Ulster at Jordanstown,
Northern Ireland. Berlin, FRG, Heidelberg, FRG, NewYork, NY: Springer Verlag. Also
available as Technical Report UL-CSIS-90-6,Departmentof ComputerScience and Information
Systems, University of Limerick, April 1990.
Sutcliffe, R.F.E. (1992a). Representing Meaningusing Microfeatures. In R. Reilly and N.E.
Sharkey (eds) Connectionist Approaches to Natural LanguageProcessing. EnglewoodCliffs,
N J: LawrenceErlbaumAssociates. Also available as Technical Report UL-CSIS-90-2,Department of ComputerScience and Information Systems, University of Limerick, January 1990.
Sutcliffe, R.F.E. (1992b). PELICAN:
A Prototype Information Retrieval System using Distributed Propositional Representations. To Appear in H. Sorensen (Ed.) Proceedings of AICS91 - The Fourth Irish Conferenceon Artificial Intelligence and Cognitive Science, University
College Cork, 19-20 September 1991. London, UK, Berlin, FRG,Heidelberg, FGR,NewYork,
NY:Springer-Verlag. Also available as Technical Report UL-CSIS-91-9,Department of Computer Science and Information Systems, University of Limerick, April 1991.
Sutcliffe, R.F.E. (1992c). Constructing Distributed SemanticLexical Representations using
MachineReadable Dictionary. To appear in K.T. Ryan and R.F.E. Sutcliffe (eds) Proceedings
of AICS-91- The Fifth Irish Conferenceon Artificial Intelligence and Cognitive Science, University of Limerick, 10-11 September I992. London, UK, Berlin, FRG,Heidelberg, FGR,New
York, NY:Springer-Verlag. Also available as Technical Report UL-CSIS-92-8,Departmentof
ComputerScience and Information Systems, University of Limerick, April 1992.
Vossen, P. (1990). The End of the Chain: Where Does Decomposition of Lexical Knowledge
Lead us Eventually? Esprit BRA-3030ACQUILEX
WP010. English Department, University
of Amsterdam,The Netherlands.
Wilks, Y., Fass, D., Guo, C.-M., Macdonald,J., Plate, T. and Slator, B. (1990). Providing
MachineTractable Dictionary Tools. Machine7~canslation, 5, 99-154.
58