Jaya Saraswati Rajita Shukla Ripple P. Goyal Pushpak Bhattacharyya Linking of Hindi wordnet (version 1.2) to the Princeton WordNet (version 2.1) Scenario in a Multilingual Country like India o 22 Official languages and hundreds of dialects o Several Linguistics Families Indo-European languages, Indo-Aryan and Dravidian languages, Austro-Asiatic, Tibeto-Burman. 1. Need for Linkage 2. Challenges in Linkage 3. Solutions 4. WN Synset Linkage Tool 5. Statistics Creation of Bilingual Dictionaries NLP tasks like Machine Translation and Cross Lingual Information Retrieval Word Sense Disambiguation even in the absence of sense tagged corpora in target language Create a wide wordnet grid of shared concepts. Kinship Relations Musical Instruments Kitchen Utensils Tools Species Grains Castes Occupations Wages Women denoting Caste and Occupation चाचा (caacaa) मौसा (mausaa) Uncle फूफा (phoophaa) मामा (maamaa) तबला (tabalaa) नगाडा (nagaadaa) drum मद ृ ंग (mridang) ढोल (dhol) ◦ Specific utensils डोंगा (dongaa - bowl); कटोरदान (katoradaan - container) ◦ Size difference कलछा (kalachhaa - big ladle ); कलछी (kalachhii - small ladle) o Problem of exact matches in English कनखोदनी (kanakhodanii); अंकुसी (ankusii) (very specific kinds of tools) o Size difference खर् ु ाा (khurpaa – big spud); खर् ु ी (khurpii – small spud) o English WordNet does not always have synsets for the male and female of the species में ढक (meⁿḍhaka – male frog); में ढकी (meⁿḍhakii female frog) o Some English concepts do not have separate synsets for species and male of the species शेर (śera - denoting the species tiger); शेर (śera – denoting male tiger) Millet ज्वार बाजरा मँड़ुआ लहु ार (luhaara – a member of the caste of the ironsmiths) धोबी (dhobi - a member of the caste of people who wash clothes) लुहारी (luhaarii - occupation/work of an ironsmith) सनु ारी (sunaarii - occupation/work of a goldsmith) ढुलाई (dhulaaii – wages for carrying /transporting ) र्ुताई (putaaii - wages for housepainting) o Women of various castes धोबबन (dhobina - a woman belonging to the caste of the washermen) o Wives of men from a certain caste or profession धोबबन (dhobina - wife of a washerman) Two kinds of linkages: ◦ Direct Linkage for synsets having exact equivalents in English ◦ Hypernymy Linkage for synsets which cannot be linked directly to English concepts Examples of hypernymy linkage : ◦ चाचा (caacaa) and मामा (maamaa) – to be linked to uncle ◦ तबला (tabalaa) etc. to be linked to drum ◦ डोंगा (dongaa) – to be linked to tableware ◦ कनखोदनी (kanakhodanii) – to be linked to tool ◦ ज्वार (jwaara), बाजरा (baajaraa) - to be linked to millet o Terms denoting caste – to be linked to jati o Terms denoting professions – to be linked to occupation o Terms denoting remunerations – to be linked to wage o Terms for women of various castes – to be linked to jati o Terms for wives of men belonging to various castes and occupations - to be linked to wife Size Differentiation in Tools and Utensils Direct linkage for the more popular term (as in खुर्ी khurpii) Hypernymy linkage to be used for the other (as inखुर्ाा khurpaa) o o Species and the male of the species o o Direct linkage for term denoting species (शेर śera – linked to tiger) Hypernymy linkage to be used to denote the male (शेर śera – again linked to tiger) Total Hindi synsets 34343 Number of Synsets Linked 15091 Number of Synsets Skipped 15550 Number of Synsets left for First Consideration 3702 Hypernymy Linked 20 Direct Linked 15071 Linking of the Hindi wordnet to the English wordnet, The Challenges therein, and The Solutions - Strategy of using Direct and Hypernymy Linkages Help in maximizing linkages Arun Karthikeyan Karra. 2010. WordNet Linking. Dissertation, CSE Department, IIT Bombay. Master of Technology Dipak Narayan, Debasri Chakrabarty, Prabhakar Pande and P. Bhattacharyya. 2002. An Experience in Building the Indo WordNet- a WordNet for Hindi. International Conference on Global WordNet (GWC 02), Mysore, India. Fellbaum, C. 1998. Wordnet: An Electronic Lexical Database. The MIT Press. J. Ramanand, Akshay Ukey, Brahm Kiran Singh, Pushpak Bhattacharyya. 2007. Mapping and Structural Analysis of Multi-lingual Wordnets. IEEE Data Engineering Bulletin, 30(1). Kamil Bulke. 1997. An English-Hindi Dictionary (ed.). S. Chand & Co, New Delhi, India. Lewis Henry Morgan. 1871. Systems of consanguinity and affinity of the human family. Smithsonian Contributions to Knowledge; v. 218, Washington DC. Mitesh Khapra, Sapan Shah, Piyush Kedia and Pushpak Bhattacharyya. 2009. Projecting Parameters for Multilingual Word Sense Disambiguation. Empirical Methods in Natural Language Processing (EMNLP09), Singapore. Dr. S. Awasthi and Dr. (Smt.) I. Awasthi. 2000. Chambers English-Hindi Dictionary (ed.). Allied Publisher Limited, New Delhi, India. www.Shabdkosh.com www.wikipedia.org http://pustak.org/bs/home.html http://www.thefreedictionary.com