Special Indices for LaaLaLaa Lyric Analysis & Generation Framework Dr. V. Madhan Karky, Tamil Computing Lab (TaCoLa), College of Engineering Guindy, Anna University, Chennai. Overview • • • • • • • • • Objective Introduction Background Rhyme Schemes in Tamil Meter Pattern System Architecture Indexing Structure Indexing Algorithm Results and Analysis Objective • To build special indices for the LaaLaLaa lyric analysis and generation framework to facilitate faster retrieval based on – Meter Pattern – Rhyme Introduction • Tamil is a vibrant language with a rich grammar, vocabulary, an inherent poetic flavor. • About 1000 lyrics are being written every year as private albums, jingles and as original soundtracks of mainstream movies. Background • WASP (Pablo Grevas (2000)) splits a given block of text, identifies patterns and fits words from the vocabulary to get verses of similar pattern. • COLIBRI (Agudo, Grevas, Calero (2002)) follows a case based approach to poem generation. Background • Tra-la-Lyrics (Oliviera, Cardoso, Pereira (2005)) finds out the beat pattern of the midi file and places words with similar syllabic division and stress pattern. • Automatic generation of tamil lyrics for melodies (Ramakrishnan, Kuppan, Devi (2009)) converts the midi file to KNM reprsn and fits words to it from a corpus. Phrases are meaningful as parts only. No edhugai, monai or iyaybu. Background • LaaLaLaa (Sowmiya, Karky (2010)) talks of splitting raw text from midi file to templates and filling them with words from a wordnet according to the pattern mined from an existing corpus of lyrics with due consideration to rhyme, meaning and flow. Rhyme Schemes in Tamil • Two words are said to rhyme in – Monai (ம ோனை) - first letters are the same. – Edhugai (எதுனை) - second letters are the same. – Iyaibu (இனைபு) - last letters are the same. Rhyme Schemes in Tamil • Examples: – பறவை and பச்வை rhyme in monai. – அருவி and விருப்பு rhyme in edhugai. – ைோக்னை and வோழ்க்னை rhyme in iyaibu. – அருவி and குருவி rhyme in edhugai and iyaibu. – ைவினைைள் and ைவிஞர்ைள் rhyme in all the three schemes. Meter Pattern • Maathirai ( ோத்ைினை) - time taken to wink an eyelid. • Maathirai based classification of Tamil alphabets. – Nedil (N) (நெடில்) - Those alphabets which are pronounced for the time interval of 2 maathirai. – Kuril (K) (குறில்) - Alphabets which take 1 maathirai to be pronounced. – Mei (M) (ந ய்) - Alphabets which are pronounced for 0.5 maathirai. • Meter pattern of a word refers to its Kuril Nedil Mei pattern. • For example, the Meter pattern of the word போடல் is NKM as போ is a Nedil(N), ட is a Kuril(K) and ல் is a Mei(M). System Architecture Word Object Convertor Lyric DB Rhyme Extractor Rhyme Pattern Extractor Index Builder Rhyme Meter Index Indexing Structure Part of Speech Letter1 Words Letter2 Words Letter1 Words Letter2 Words Letter1 Words Letter2 Words Letter1 Words Letter2 Words Letter1 Words Letter2 Words Letter1 Words Letter2 Words MeterPattern1 ம ோவை MeterPattern2 MeterPattern1 எதுவை MeterPattern2 MeterPattern1 இவைபு MeterPattern2 Indexing Algorithm Result and Analysis 4000 3500 3000 Meter Rhyme Indexed Approach 2500 Word Indexed Approach Retrieval Time (in milliseconds)2000 1500 1000 500 1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307 324 341 358 375 392 409 426 443 460 477 494 0 Word Results and Analysis • Retrieval complexity of both the approaches tested using a dataset of 500 tamil words. • The average retrieval time in – Word indexed approach - 875.47millisecond – Meter Rhyme Indexed approach – 1.90millisecond Results and Analysis • The drastic decrease in retrieval time from O(α) to O(1) [α is the number of words in the database] is due to – The use of hash-tables which are efficient for retrieval. – Having separate hash-tables for the ம எதுனை and இனைபு of each POS. ோனை, Thank You!!!