MORPHOLOGICAL ANALYSIS OF KOKBOROK FOR UNIVERSAL NETWORKING LANGUAGE DICTIONARY Swapan Debbarma, Assistant Professor, Department of Computer Science National Institute of Technology Agartala, India swapanxavier@gmail.com Khumbar Debbarma, Braja Gopal Patra M.Tech Students, Department of Computer Science National Institute of Technology Agartala, India khum_10jan@yahoo.co.in brajagopal_patra@yahoo.com Abstract—This paper focuses on morphological analysis of Kokborok words to incorporate them into Kokborok dictionary and Kokborok Machine translator. So far, no attempt has been made to integrate the works for a concrete computational output. In this paper we particularly emphasize on bringing works on morphological analysis in the frame, with the goal to produce a Kokborok-dictionary, as well as Machine Translator, which will provide a unified base to fit into already developed universal conversion systems of UNL. We explain the morphological rules of Kokborok words for UNL structures. These rules tend to expose the modifications of parts of speech with regards to tense, person, subject etc. of the words of a sentence. Here we outline the morphology of nouns, verbs and adjective phrases only. Keywords-Morphology, Kokborok, Universal Networking Language (UNL), UNLKokborok dictionary. I. INTRODUCTION Kokborok is the native language of the Tripuris. One of the North-East states of India called as Tripura. The state Tripura is surrounded by its neighbouring states Assam, Manipur, Mizoram and the country Bangladesh. Many Tripuris live and speak Kokborok language in Myanmar, Nepal, Bhutan and Bangladesh comprising of 2.5 million in numbers [3][5]. The Kokborok language is very scientific and their scripts are similar to Roman script. This language belongs to the Bodo sub-group of the Tibeto-Burman language family and has unique features if compared with other South-Asian Tibeto-Burman languages. Due to the language barrier the common people face big obstacle to enjoy the optimum benefits of modern information and communication technology (ICT) as well as huge enriched English knowledge database around the globe. The UNL which is a formal language for symbolizing the sense of natural language sentences is a specification for the exchange of information. UNL includes 16 languages, which are the six official languages of the United Nations (Arabic, Chinese, English, French, Russian and Spanish), in addition to the ten other widely spoken languages (German, Hindi, Italian, Indonesian, Japanese, Latvian, Mongol, Portuguese, Swahili and Thai). Just like Kokborok the languages included in UNL, i.e. Chinese, Thai also belong to the Sino-Tibetan language group. Thus including Kokborok in UNL will facilitate the interaction of Kokborok speaking people with others. Morphology is the field of linguistics that studies the structure of words. It focuses on patterns of word-formation within and across languages, and attempts to formulate rules that model the knowledge of the speakers of those languages. Three approaches to the study of morphology are distinguished [6]: Morpheme-based morphology, which makes use of an Item-and–arrangement approach. Lexeme-based morphology, which normally makes use of an Item-and-Process approach. Word-based morphology, which normally makes use of Word-and-Paradigm approach The first one is the most common, and we concentrated our efforts on that. Morpheme-based morphology: Here word forms are analyzed as arrangements of morphemes. A morpheme is defined as the minimal meaningful unit of a language. In a word like independently, we say that the morphemes are in-, depend, -ent, and ly, depend is the root and the other morphemes are, in this case, derivational affixes. In a word like dogs, we say that dog is the root, and –s is an inflectional morpheme. This way of analyzing word forms treats words as if they were made of morphemes put after each other like beads on a string, is called Item-and-Arrangement [6]. Thus Morphological analysis is found to be centered on analysis and generation of word forms. It deals with the internal structure of words and how words can be formed. Morphology plays an important role in applications such as spell checking, electronic dictionary interfacing and information retrieving systems, where it is important that words that are only morphological variants of each other are identified and treated similarly. In natural language processing (NLP) and machine translation (MT) systems we need to identify words in texts in order to determine their syntactic and semantic properties [4]. Morphological study comes here to help with rules for analyzing the structure and formation of the words. II. KOKBOROK MORPHEMES TO BE REPRESENTED AS A Kokborok morpheme, besides the root word, is supposed to be represented in the Kokborok-UNL -dictionary using the following UNL format. [HW]{ID}”UW” (ATTRIBUTE, ATTRIBUTE,…) <FLG, FRE, PRI> HW Head Word (Kokborok Word) ID Identification of Head Word (omitable) UW Universal Word ATTRIBUTE FLG FRE PRI Attribute of the HW Language Flag Frequency of Head Word Priority of Head Word The attributes describe the nature of the head word classifying it as a grammatical, semantic or morphological feature. So, we will be specially concerned about representation of morphemes using various attributes. We have the following two divisions for presenting morphological structure of Kokborok words. A. Morphological analysis of simple Kokborok words Traditional grammars of Kokborok recognize five parts of speech. The parts of speech classification plays a central role for explaining the structure of a sentence. So, we have to deal with five different types of words in Kokborok sentences, i.e. noun, adjective, pronoun, preposition, verb and in this section we concentrate on nouns, adjectives and verbs only. Two different types of morphologies are recognized for simple Kokborok words [4]. 1) Inflectional Morphology Inflectionalmorphology derives words from another word form acquiring certain grammatical features but maintaining the same part of speech or category. There are a number of inflectional suffixes indicating number of the nouns and pronouns of a sentence. Here we can explain the following five different types of morphologies. TABLE I. EXAMPLES OF NOUN MORPHOLOGY Number Root Word as word appears in a sentence Singular chwla chwlano (boy) (to the boy) • Nouns. Kokborok has a very strong and structural inflectional morphology for its nouns based on case. Case of noun may be nominative (“chwla”, man), accusative (“chwla-no”, to the boy), and genitive (“chwla-ni”, of the boy), locative (“nog-o”, in/at house) and so on. Gender and number are also important for identifying proper categories of nouns. Number may be singular (“ri”, clothe) or plural (“rirok”, clothes) gender of nouns can be masculine (“takhuk”, brother), feminine (“bukhuk”, sister), common (“cherai”, child) and neuter (“swikong”, pen) we consider six different types of nouns and show possible representation of their inflectional suffixes in the UNL format. Some examples of analysis of nouns are shown in Table-I, based on number. • Proper noun. The names of specific things, e.g. Khumbar (name of a person), Saidra (name of a river) are taken here. For example “Khumbarno twi ridi” (give water to Khumbar) here “no” is the inflectional morpheme. [Khumbar]{}”Khumbar(iof>person)”(N,FEMALE,ANI,3SG)<B,0,0>[no]{}”no”(NMOR)<K,0,0> • Common noun. Common names of particular things e.g. bufang (tree), aa (fish), borok (man), wak (pig), twi (water) are considered here. Common nouns can be singular or plural. For example “bufangrok tanwi khibikha” (trees have been cut). “bufango muisurum kwbangma”(many ants on tree). “bufangni uklogo huijakdi”(hide behind the tree). Here “rok”, “o” and “ni” are inflectional morphemes. [bufang]”tree (icl>plantation)” (N)<K.0.0> [rok]{}”rok” (NMOR)<K,0,0> • Material noun. The names of materials for example. Rangchak (gold), watwi (rain), “rangchakni motok” (golden crown). “watwio ta sijakdi” (don’t get wet in the rain), here “ni” and “o” are inflectional morpheme. [rangchak]{}”gold (icl>thing)”(N)<K,0,0> [ni]{}”ni”(NMOR)<K,0,0> • Abstract noun. This types of nouns represent names of quality e.g. tongthokma (happiness), Khakhamma (sadness), wanama (tense), kirima (fear), here “ma” is inflectional morpheme. For example “ang rwchabwi tongthokma mano” (I get happiness from singing), here “ma” is inflectional morpheme. [Tongthok]{}“happiness(icl>state)”(N)<K,0,0> [ma]{}”ma”(NMOR)<K,0,0> • Verbal noun. These types of verbs indicate nouns. e.g. ”ang bazaro thangna nango” (I need to go to market) “ang khwna rwchabna nango”. Here “na” is inflectional morpheme. [thang]{}”go(icl>do)”(V)<K,0,0> [na]{}”na”(NMOR)<K,0,0> • Collective noun. These types of nouns indicate the collective names of people or things. e.g. borok (people), dopha (clan), hoda (society), “chini borokrokno sadi”(tell our people). [borok]{}”people(icl>human)”(N)<K,0,0> [rokno]{}”rokno”(NMOR)<K,0,0> • Adjective morphology. Inflectional suffixes are also present in Kokborok adjectives. Some examples are given in Table II. TABLE II. EXAMPLES OF ADJECTIVE MORPHOLOGY Root word Word as Inflection appears in a al prefix sentence sok (to rot) kosok (rotten) Here [ham] {}”good (iof>quality)” (N) ko [ka]{}”ka” (ADJMORP) [sok]{}”rot (icl>do)” (V) [ko]{}”ko” (ADJMORPH) TABLE III. MORPHOLOGY OF ROOT VERB CHA (EAT) AND KHAI (DO) Person/tense Verb as appears Inflectional suffix First Present chao/khaio o Present continuous chawi tongo/khaiwi tongo witongo Past chakha/khaikha kha Past continuous chawi tongmani/khaiwi tongmani witongmani Past perfect chamani/khaimani mani Diversity of verb morphology in Kokborok is very significant. For example if we consider ‘cha’ as root word than after adding ‘kha’ we get ‘chakha’ which means work has been done in past. Similarly after adding ‘witongo’ means work is being done in present and by adding ’witongmani’ means work was being done in the past. We applied morphological analysis for different person to find the actual meaning of the word. We show some data for root verbs cha (eat) and khai (do) in Table III. We give examples of the Kokborok word “swi” which means write. The dictionary entry is [swi]{}”write (icl>do)” (list of semantic and syntactic attributes) <K,0,0>, where “swi” is the head word. Some possible transformation of “write” in the Kokborok UNL dictionary shown below. //For First person [-swi]{}”write(icl>do)”(V,@present,indf) [-o]{}”o”(VMOR,@present,indf.) [-witongo]{}”witongo”(VMOR@present,cont.) [-kha]{}“kha”(VMOR@past, indf.) [-witongmani]{}”witongmani”(VMOR@past,cont.) [-mani]{}”mani”(VMOR@past,perf.) [-jago]{}”jago”(VMOR@fact.) [-nai]{}”nai”(VMOR@future,indf) //For Second person [-di]{}”di”(VMOR@present,indf) [-kha]{}”kha”(VMOR@past,indf) [nai]{}”nai”(VMOR@future,indf) //For Third person [o]{}”o”(VMOR@present,indf) [-witongo]{}”witongo”(VMOR@present,cont.) [-kha]{}“kha”(VMOR@past, indf.) [-witongmani]{}”witongmani”(VMOR@past,cont.) [-mani]{}”mani”(VMOR@past,perf.) [-nai]{}”nai”(VMOR@future,indf) Such dictionary order with root word followed by the derivations will help in quick search to find UW and the attributes of Kokborok word. 2) Derivational Morphology This is the morphology of words similar to compound words. Words which is a combination of two or more words belonging to different part of speech for example “Tokkoknai” where “Tok (bird)” is Noun and “Koknai (shooter)” is also a noun, i.e Noun+Noun=Noun ,”amingbolong(wildcat)” where “aming” is Noun and “bolong” is adjective i.e Noun+Adjective=Noun B. Morphological Analysis of compound Kokborok words Compound words are those words which contain more than one root word. Different types of compound words are given below. Noun +Noun=Noun “Ma-Pha” (mother and father) Noun +Adjective=Adjective “Bwkhnai kuphur” (hairwhite/greyhair) Pronoun +Noun +Noun=Noun “Chini-ha-ama” (our- mother- land) C. Addition of Prefixes with root word in compound Kokborok words In Kokborok, some compound words are formed from root word with the addition of prefixes. And the prefixes changes according to the person. We have taken root word “ChwiChu” (Grand Mother and Grand Father). III. CONCLUSION AND FUTURE WORK It was quite challenging work as very limited or no work has been done on Kokborok. In this paper we have analyzed the morphology of Kokborok and outlined a dictionary development procedure of Kokborok dictionary according to UNL format. We implemented some rules to develop the morphological analysis of simple have and compound Kokborok words that can be used to make UNL-Kokborok dictionary. Our future plan in this regard is to Develop morphological rules for Kokborok Root, verbal suffix and primary suffix. Structure the dictionary entries of Kokborok morphemes for rule generation REFERENCES [1] Md. Nawab Yousuf Ali, S. M. Abdullah Al-Mamun, Jugal Krishna Das, Abu Mohammad Nurannabi, “Morphological Analysis of Bangla Words for Univesal Networking Language” , ICDIM 2008, Third International Conference in Nov “2008. [2] Md. Firoz Mridha, Md. Nurul Huda, Chowdhury Mofizur Rahman, Jugal Krishna Das, “Development of Morphological Rules for Bangla Root, Verbal Suffix and Primary Suffix for Universal Networking Language” 6th International Conference on Electrical and Computer Engineering ICECE 2010, Dhaka, Bangladesh. [3] Md. Kamrul Hasan,”Causative Constructions in Kok-borok”, the Dhaka university journal of linguistics: Vol.2 No.4 August 2009 page: 115-137. [4] M.M. Asaduzzaman, M. M. Ali, ”Morphological Analysis of Bangla Words for Automatic Machine Translation”, International Conference on Computer and Information Technology (ICCIT), Dhaka, 2003, pp.271-276. [5] Binoy Debbarma,”Learn Kokborok in Three Months” Language Wing, Education Dept., TTAADC, Khumulwng, Tripura. [6] http://en.wikipedia.org/wiki/Morphology_%28linguistics%29 [7] http://www.undl.org/