FIRE 2013 Presentation on : Transliterated Search using Syllabification Approach By:Hardik Joshi1, Apurva Bhatt1, Honey Patel2 {hardikjjoshi,apurva.bhatt7,Honeypatel.39}@gmail.com 1Department of Computer Science, Gujarat University, Ahmedabad, India. 2L.J. College of Engineering, Ahmedabad, India Dec @FIRE 4rth Dec 2013 Content Introduction Our Approach Syllabification Our Results Error And Analysis Conclusion Introduction There is need to provide local language support in web based applications because various domains such as ecommerce sites require English knowledge. The challenge in transliteration is take the word “राष्ट्रपति” for this word “rashtrapati”, “rashtrapathi”, “raashtrapathy”, “raashtrpati” are various possible combinations may possible which one should be correct is again an issue. Transliteration tasks become difficult in presence of out of vocabulary words (OOV) and noisy words. In both the subtasks, the transliteration was performed using syllabification approach. In the subtask-1, we had done the morphological analysis of English words , then a corpus based approach used to identify frequently occurring Hindi words. In the subtask-2, the queries were formulated that contained both Roman and Devanagari script and Roman script for separate run submissions. Syllabification Approach syllable Linguists have different languages have constraints on possible consonant and vowel sequences that characterize not only the word structure for the language but also the syllable structure. Onset Rhyme Vowels @ center (nucleus) consonant @ beginning (onset) End is coda coda nucleus Syllable Structure Example Word Sprint Training Format Source sudakar chhagan jitesh narayan shiv madhav mohammad Target स ◌ु द ◌ुा क र छगण ज िु◌ ि ◌ु श न ◌ुा र ◌ुा य ण श िु◌ व म ◌ुा ध व म ◌ु ह म ◌ु म द Algorithm for subtask-I Step 1: First of all words are fetching in English dictionary. Step 2: perform spell-check ,stemming and also morphological analysis for English language, if no spell error and match found then label the word as English =E. Step 3: If English word are not found then check with English corpus of US News paper. Step 4: If English word found then check with English corpus of Indian news paper. Step 5: If English word found in US News paper and not found in Indian news paper then word=E. Step 6: Step 2 and step 5 are parallel apply for English words and label as =\E. Step 7: Remaining words would be transliterate into Hindi words and Label the word as = \H. Step 8: Apply to Moses tool ,which one is help English words transliterate into Hindi words. RESULT OF SUBTASK-1 Results For Subtask 2 Run 1 “मर सापन न कक रानी काब आयगी ि mere sapnon ki rani kab aayegi tu”. Run 2 “mere sapnon ki rani kab aayegi tu”. Metrics Run-1 Run-2 Maximum Score Median Score nDCG@5 0.5627 0.5262 0.8052 0.5620 nDCG@10 0.5619 0.5232 0.8002 0.5608 MAP 0.2546 0.2163 0.4236 0.2355 MRR 0.5835 0.5730 0.8440 0.5884 Error And Analysis There are some problems in the transliteration which decreased the precision. Error in the maatra : “sapnon” => “सापन न”, “ki” => “की”, “kab” => “काब”, “main” => “ममन” & “mein” => “मीन” , na => न & ka => क Multiple Mapping of the words e.g. T = ि, ट, i.e. tera=>टरा, tum => िूम, to => ट , teri =>टरर . Missing sounds (फ, ख, छ ‘chh’, ksh) i. e. for word “accha” we got “आक्का”, for , “poochho” we got “पछ ू ट”. Multiple Transliterations- c,k The vowel are not giving perfect answers i.e. “lo” => “लॉ” , “ho”=> “ह र”, “ko” => “कॉ” Spelling Variations(shree,shri) Conjuncts formation(“kya” => “कया”) Missing of vowels ‘ak tr khan’ (अक ु िर खान) ‘y’ As Vowel: ‘anthony’ & ‘Shyam’ Conclusion We used the syllabification approach and considered the most probable term in the transliteration process. The word labeling task was performed assuming that a term either belongs to English language or Hindi language. We were able to get high accuracy in English recall as the labeling approach used morphological analysis and dictionary approach. However due to syllabification model, the transliteration did not give high precision resulting in lower precision of transliteration tasks and subsequently lower precision metrics in the song lyrics retrieval tasks.