From: AAAI Technical Report SS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Using Distributed Patterns as Language Independent Lexical Representations 1Richard F.E. Sutcliffe Annette McElligott Gerry () N~ill Department of Computer Science and Information Systems University of Limerick Limerick, Ireland 1 Introduction While it is possible to construct Machine Translation (MT) systems of surprising sophistication using the technique of transfer between augmented parse trees (Brockmann, 1991) few people would doubt that in order to perform a fully satisfactory translation it will ultimately be necessary to work with meaning representations. However, there are many problems with developing a computationally tractable representation scheme for linguistic meanings either at the sentential (propositional) or lexical levels. One approach to the problem of capturing meanings at the lexical level is to use a form of distributed representation where each word meaning is converted into a point in an n-dimensional space (Sutcliffe, 1992a). Such representations can capture a wide variety of word meanings within the same formalism. In addition they can be used within distributed representations for capturing higher level information such as that expressed by sentences (SutcliiTe, 1991a). Moreover, they can be scaled to suit a particular tradeoff of specificity and memoryusage (Sutcliffe, 1991b). Finally, distributed representations can be processed conveniently by vector processing methods or connectionist algorithms and can be used either as part of a symbolic system (Sutcliffe, 1992b) or within a connectionist architecture (Sutcliffe, 1988). A further point of interest regarding distributed representations for nouns is that they can be extracted automatically from machine readable dictionaries. As is well known, dictionaries contain much taxonomic information (Amsler, 1984). This information can be extracted automatically by parsing the dictionary definitions and extracting taxonomic pointers from them in a recursive fashion. For example both Vossen (1990) and Guthrie et al. (1990) constructed exhaustive taxonomies for the Longmans Dictionary of Contemporary English in this way. Wehave shown one manner in which it is possible to traverse such a taxonomy determining both the features germane to a particular concept and their appropriate centralities (strengths). In the present work we have taken this idea further by showing how lexicons different languages can be mappedinto the same feature space thus allowing lexical translation between those languages. 1Weare mostgrateful to TomBrehony,ColmanCollins, Denis Hickey, Ken Litkowski, TonyMolloy, RedmondO’BrienandPiek Vossenfor their help with this research. Theauthors’emails are sutcliffer~ul.ie, mcelligotta~ul.ieandoneillg~ul.ie 55 cealg (treachery) iu’r (yew) sleight 0.495074 subterfuge0.386588 shenanigan0.281638 circlet 0.257248 folder 0.257248 ring 0.257248 round 0.257248 bacillus 0.191211 atoll 0.181902 beanie 0.181902 arborvitae0.421825 larch 0.421825 myrtle 0.421825 box 0.36855 carob 0.36855 bock 0.344418 cormorant0.344418 tone 0.344418 periwinkle0.344418 winkle 0.344418 Example System Outputs 2 The Extraction Algorithm Our original extraction algorithm was very simple (Sutcliffe 1992c). Suppose the dictionary contains the definitions "An Afghan is a large dog", "A dog is a furry animal", and "An animal is a sentient entity". A representation for Afghan can be constructed by asserting that it is [-{-large +furry -{-sentient]. Weopt to weight these attributes according to their distance from the concept being defined, giving: [<large, 1.0> <furry, 0.9> <sentient, 0.8>]. Such a representation can be considered as a vector in an n-dimensional space, where n is the total number of distinct features extracted from the dictionary by repeating the above procedure for all noun entries. Once such representations have been normalised (scaled to length one) we can determine how similar in meaning a pair of concepts are by computing, say, the dot product of their representations. 3 The Translation Method In an initial experiment we have investigated whether it is possible to use the above extraction algorithm as a basis for lexical translation. The method was as follows. First we constructed a set of representations for 4263 nouns in the Merriam Webster Compact Electronic Dictionary in terms of a set of 2013 English features. Next we constructed a set of representations for 2928 Irish words in terms of 505 Irish features extracted from the Focl6ir Beag dictionary. Thirdly we devised by hand a translation which maps each Irish feature onto one or more of the 2013 English features. Finally we converted each Irish word representation from Irish features to English features. The reason for allowing one source language feature to map to several target language features is that in many cases there was no exact translation between membersof the two feature sets. In such cases we scaled the centralities of the selected target features so that their length taken as a whole was the same as the ’length’ of the original source feature. Thus in translating [geal,0.601] using the English features ’bright’, ’bright-colored’, ’brilliant’, ’light’ and ’light-colored’ we added the following set to the target representation: [[bright, 0.269], [bright-colored, 269], [brilliant,0.269], [light, 0.269], [light-colored, 0.269]]. The purpose here was to ensure that the overall contribution to meaning of the target English feature list was the same as that of the original source Irish feature. The result of these stages is that we have a set of English words and a set of Irish words 56 each of whose meanings is defined as some point in a commoninterlingual space defined by the English feature set. Nowit is possible to translate from, say, Irish to English in the following manner. Weselect the Irish source word and retrieve its representation. This is then compared with the representation of each English word in the lexicon. Weselect as the English translation the word whose representation best matches that of the original Irish word. 4 Results and Discussion The figure shows two example outputs of the system, translating from Irish into English. As can be seen, the system is selecting sensible translations for the source words and the results are in general good enough to encourage further work. However, there are still many problems with the system. There are two main reasons for this. Firstly, the parsers we are using can only recognise noun phrases and thus reject a large proportion of the noun definitions. As a result our lexicon for each language is only based on a part of the complete taxonomy defined by the parent dictionary. Secondly we have not as yet performed any post-processing on the distributed representations constructed from the taxonomies. For example many features are synonymsof each other, meaning that if one concept is, say, ’big’ but not ’large’ while another is ’large’ but not ’big’ the similarity between the two is not captured by the representations. We are working on algorithms which collapse such synonyms thus increasing the average frequency with which a feature occurs and thus improving the performance of the system. As we have noted earlier, the translation between features in the source language and those of the target language is at present accomplished by hand. However a bilingual dictionary could also be used for this purpose. It should be noted that such a dictionary can be much more modest in scope than either of the monolingual dictionaries participating in any language mapping, and that errors in translation caused by, say, problems in resolving source or target word senses are muchless injurious to the integrity of the translation than would be the case if a bilingual dictionary were being used for the translation per se. In summary, we have presented a method for mapping between noun senses in pairs of monolingual dictionaries. The method involves constructing distributed representations for noun meanings in terms of the appropriate host language features, and then translating features in one language into the other. We have demonstrated the method using words from the CED and Focl6ir Beag dictionaries. 5 References Amsler, R.A. (1984). Machine-Readable Dictionaries. and Technology (ARIST), 19, 161-209.. Annual Review of Information Science Brockmann, D. (1991). Transfer Problems. Internal Memorandum ET IM 026, Computational Linguistics and Machine Translation Group, Department of Language and Linguistics, University of Essex, UK. Evens, M. (1989). Computer-Readable Dictionaries. and Technology (ARIST), 24, 85-117. Annual Review of Information Science Guthrie, L, Slator, B.M, Wilks, Y. and Bruce, R. (1990). Is there Content in Empty Heads? Proceedings of the 13th International Conference on Computational Linguistics (COLING-90), Helsinki, Finland, 3, 138-143. Sutcliffe, R.F.E. (1988). A Parallel Distributed Processing Approach to the Representation 57 Knowledgefor Natural LanguageUnderstanding. Unpublished doctoral thesis, University of Essex, UK. Sutcliffe, R.F.E. (1991a). Distributed Representations in a Text Based Information Retrieval System: A NewWayof Using the Vector Space Model. In Proceedings of the Fourteenth Annual International A CM/SIGIRConference on Research and Developmentin Information Retrieval, Chicago, II., October 13-16, 1991. NewYork, NY:ACM Press, pp. 123-132. Also available as Technical Report UL-CSIS-91-6, Department of ComputerScience and Information Systems, University of Limerick, March1991. Sutcliffe, R.F.E. (1991b). Distributed Subsymbolic Representations for Natural Language: Howmany Features Do You Need? Proceedings of the 3rd Irish Conference on Artificial Intelligence and Cognitive Science, 20-21 September1990, University of Ulster at Jordanstown, Northern Ireland. Berlin, FRG, Heidelberg, FRG, NewYork, NY: Springer Verlag. Also available as Technical Report UL-CSIS-90-6,Departmentof ComputerScience and Information Systems, University of Limerick, April 1990. Sutcliffe, R.F.E. (1992a). Representing Meaningusing Microfeatures. In R. Reilly and N.E. Sharkey (eds) Connectionist Approaches to Natural LanguageProcessing. EnglewoodCliffs, N J: LawrenceErlbaumAssociates. Also available as Technical Report UL-CSIS-90-2,Department of ComputerScience and Information Systems, University of Limerick, January 1990. Sutcliffe, R.F.E. (1992b). PELICAN: A Prototype Information Retrieval System using Distributed Propositional Representations. To Appear in H. Sorensen (Ed.) Proceedings of AICS91 - The Fourth Irish Conferenceon Artificial Intelligence and Cognitive Science, University College Cork, 19-20 September 1991. London, UK, Berlin, FRG,Heidelberg, FGR,NewYork, NY:Springer-Verlag. Also available as Technical Report UL-CSIS-91-9,Department of Computer Science and Information Systems, University of Limerick, April 1991. Sutcliffe, R.F.E. (1992c). Constructing Distributed SemanticLexical Representations using MachineReadable Dictionary. To appear in K.T. Ryan and R.F.E. Sutcliffe (eds) Proceedings of AICS-91- The Fifth Irish Conferenceon Artificial Intelligence and Cognitive Science, University of Limerick, 10-11 September I992. London, UK, Berlin, FRG,Heidelberg, FGR,New York, NY:Springer-Verlag. Also available as Technical Report UL-CSIS-92-8,Departmentof ComputerScience and Information Systems, University of Limerick, April 1992. Vossen, P. (1990). The End of the Chain: Where Does Decomposition of Lexical Knowledge Lead us Eventually? Esprit BRA-3030ACQUILEX WP010. English Department, University of Amsterdam,The Netherlands. Wilks, Y., Fass, D., Guo, C.-M., Macdonald,J., Plate, T. and Slator, B. (1990). Providing MachineTractable Dictionary Tools. Machine7~canslation, 5, 99-154. 58