here - cfilt

advertisement
Jaya Saraswati
Rajita Shukla
Ripple P. Goyal
Pushpak Bhattacharyya

Linking of Hindi wordnet (version 1.2) to the Princeton
WordNet (version 2.1)

Scenario in a Multilingual Country like India
o
22 Official languages and hundreds of dialects
o
Several Linguistics Families
Indo-European languages, Indo-Aryan and Dravidian
languages, Austro-Asiatic, Tibeto-Burman.
1. Need for Linkage
2. Challenges in Linkage
3. Solutions
4. WN Synset Linkage Tool
5. Statistics

Creation of Bilingual Dictionaries

NLP tasks like Machine Translation and Cross Lingual
Information Retrieval

Word Sense Disambiguation even in the absence of
sense tagged corpora in target language

Create a wide wordnet grid of shared concepts.










Kinship Relations
Musical Instruments
Kitchen Utensils
Tools
Species
Grains
Castes
Occupations
Wages
Women denoting Caste and Occupation
चाचा
(caacaa)
मौसा
(mausaa)
Uncle
फूफा
(phoophaa)
मामा
(maamaa)
तबला
(tabalaa)
नगाडा
(nagaadaa)
drum
मद
ृ ंग
(mridang)
ढोल (dhol)
◦ Specific utensils
डोंगा (dongaa - bowl); कटोरदान (katoradaan - container)
◦ Size difference
कलछा (kalachhaa - big ladle ); कलछी (kalachhii - small
ladle)
o
Problem of exact matches in English
कनखोदनी (kanakhodanii); अंकुसी (ankusii)
(very specific kinds of tools)
o
Size difference
खर्
ु ाा (khurpaa – big spud); खर्
ु ी (khurpii – small spud)
o
English WordNet does not always have synsets for
the male and female of the species
में ढक (meⁿḍhaka – male frog); में ढकी (meⁿḍhakii female frog)
o
Some English concepts do not have separate synsets
for species and male of the species
शेर (śera - denoting the species tiger); शेर (śera – denoting
male tiger)
Millet
ज्वार
बाजरा
मँड़ुआ

लहु ार (luhaara – a member of the caste of the
ironsmiths)

धोबी (dhobi - a member of the caste of people
who wash clothes)


लुहारी (luhaarii - occupation/work of an
ironsmith)
सनु ारी (sunaarii - occupation/work of a
goldsmith)

ढुलाई (dhulaaii – wages for carrying /transporting )

र्ुताई (putaaii - wages for housepainting)
o
Women of various castes
धोबबन (dhobina - a woman belonging to the caste of the
washermen)
o
Wives of men from a certain caste or profession
धोबबन (dhobina - wife of a washerman)

Two kinds of linkages:
◦ Direct Linkage for synsets having exact
equivalents in English
◦ Hypernymy Linkage for synsets which cannot
be linked directly to English concepts

Examples of hypernymy linkage :
◦ चाचा (caacaa) and मामा (maamaa) – to be linked to uncle
◦ तबला (tabalaa) etc. to be linked to drum
◦ डोंगा (dongaa) – to be linked to tableware
◦ कनखोदनी (kanakhodanii) – to be linked to tool
◦ ज्वार (jwaara), बाजरा (baajaraa) - to be linked to millet
o
Terms denoting caste – to be linked to jati
o
Terms denoting professions – to be linked to occupation
o
Terms denoting remunerations – to be linked to wage
o
Terms for women of various castes – to be linked to jati
o
Terms for wives of men belonging to various castes and
occupations - to be linked to wife
Size Differentiation in Tools and Utensils

Direct linkage for the more popular term (as in खुर्ी khurpii)
Hypernymy linkage to be used for the other (as inखुर्ाा khurpaa)
o
o

Species and the male of the species
o
o
Direct linkage for term denoting species (शेर śera – linked to
tiger)
Hypernymy linkage to be used to denote the male (शेर śera –
again linked to tiger)
Total Hindi synsets
34343
Number of Synsets Linked
15091
Number of Synsets Skipped
15550
Number of Synsets left for First
Consideration
3702
Hypernymy Linked
20
Direct Linked
15071

Linking of the Hindi wordnet to the English wordnet,

The Challenges therein, and

The Solutions - Strategy of using Direct and
Hypernymy Linkages

Help in maximizing linkages

Arun Karthikeyan Karra. 2010. WordNet Linking.
Dissertation, CSE Department, IIT Bombay.
Master of Technology

Dipak Narayan, Debasri Chakrabarty, Prabhakar Pande and P.
Bhattacharyya. 2002. An Experience in Building the Indo WordNet- a
WordNet for Hindi. International Conference on Global WordNet (GWC
02), Mysore, India.

Fellbaum, C. 1998. Wordnet: An Electronic Lexical Database. The MIT
Press.

J. Ramanand, Akshay Ukey, Brahm Kiran Singh, Pushpak Bhattacharyya.
2007. Mapping and Structural Analysis of Multi-lingual Wordnets. IEEE
Data Engineering Bulletin, 30(1).

Kamil Bulke. 1997. An English-Hindi Dictionary (ed.). S. Chand & Co,
New Delhi, India.

Lewis Henry Morgan. 1871. Systems of consanguinity and affinity of the
human family. Smithsonian Contributions to Knowledge; v. 218,
Washington DC.

Mitesh Khapra, Sapan Shah, Piyush Kedia and Pushpak Bhattacharyya.
2009. Projecting Parameters for Multilingual Word Sense Disambiguation.
Empirical Methods in Natural Language Processing (EMNLP09),
Singapore.

Dr. S. Awasthi and Dr. (Smt.) I. Awasthi. 2000. Chambers English-Hindi
Dictionary (ed.). Allied Publisher Limited, New Delhi, India.

www.Shabdkosh.com

www.wikipedia.org

http://pustak.org/bs/home.html

http://www.thefreedictionary.com
Download