A. Morphological analysis of simple Kokborok words

advertisement
MORPHOLOGICAL ANALYSIS OF KOKBOROK FOR
UNIVERSAL NETWORKING LANGUAGE DICTIONARY
Swapan Debbarma,
Assistant Professor, Department of Computer Science
National Institute of Technology
Agartala, India
swapanxavier@gmail.com
Khumbar Debbarma, Braja Gopal Patra
M.Tech Students, Department of Computer Science
National Institute of Technology
Agartala, India
khum_10jan@yahoo.co.in
brajagopal_patra@yahoo.com
Abstract—This paper focuses on morphological analysis of Kokborok words to
incorporate them into Kokborok dictionary and Kokborok Machine translator. So far,
no attempt has been made to integrate the works for a concrete computational output.
In this paper we particularly emphasize on bringing works on morphological analysis
in the frame, with the goal to produce a Kokborok-dictionary, as well as Machine
Translator, which will provide a unified base to fit into already developed universal
conversion systems of UNL. We explain the morphological rules of Kokborok words
for UNL structures. These rules tend to expose the modifications of parts of speech
with regards to tense, person, subject etc. of the words of a sentence. Here we outline
the morphology of nouns, verbs and adjective phrases only.
Keywords-Morphology, Kokborok, Universal Networking Language (UNL), UNLKokborok dictionary.
I. INTRODUCTION
Kokborok is the native language of the Tripuris. One of the North-East states of India
called as Tripura. The state Tripura is surrounded by its neighbouring states Assam,
Manipur, Mizoram and the country Bangladesh. Many Tripuris live and speak Kokborok
language in Myanmar, Nepal, Bhutan and Bangladesh comprising of 2.5 million in numbers
[3][5]. The Kokborok language is very scientific and their scripts are similar to Roman script.
This language belongs to the Bodo sub-group of the Tibeto-Burman language family and has
unique features if compared with other South-Asian Tibeto-Burman languages. Due to the
language barrier the common people face big obstacle to enjoy the optimum benefits of
modern information and communication technology (ICT) as well as huge enriched English
knowledge database around the globe. The UNL which is a formal language for symbolizing
the sense of natural language sentences is a specification for the exchange of information.
UNL includes 16 languages, which are the six official languages of the United Nations
(Arabic, Chinese, English, French, Russian and Spanish), in addition to the ten other widely
spoken languages (German, Hindi, Italian, Indonesian, Japanese, Latvian, Mongol,
Portuguese, Swahili and Thai). Just like Kokborok the languages included in UNL, i.e.
Chinese, Thai also belong to the Sino-Tibetan language group. Thus including Kokborok in
UNL will facilitate the interaction of Kokborok speaking people with others.
Morphology is the field of linguistics that studies the structure of words. It focuses on
patterns of word-formation within and across languages, and attempts to formulate rules that
model the knowledge of the speakers of those languages. Three approaches to the study of
morphology are distinguished [6]:

Morpheme-based morphology, which makes use of an Item-and–arrangement
approach.

Lexeme-based morphology, which
normally makes use of an Item-and-Process
approach.

Word-based morphology, which normally makes use of Word-and-Paradigm
approach
The first one is the most common, and we concentrated our efforts on that.
Morpheme-based morphology:
Here word forms are analyzed as arrangements of
morphemes. A morpheme is defined as the minimal meaningful unit of a language. In a word
like independently, we say that the morphemes are in-, depend, -ent, and ly, depend is the
root and the other morphemes are, in this case, derivational affixes. In a word like dogs,
we say that dog is the root, and –s is an inflectional morpheme. This way of analyzing
word forms treats words as if they were made of morphemes put after each other like beads
on a string, is called Item-and-Arrangement [6].
Thus Morphological analysis is found to be centered on analysis and generation of word
forms. It deals with the internal structure of words and how words can be formed.
Morphology plays an important role in applications such as spell checking, electronic
dictionary interfacing and information retrieving systems, where it is important that words
that are only morphological variants of each other are identified and treated similarly. In
natural language processing (NLP) and machine translation (MT) systems we need to
identify words in texts in order to determine their syntactic and semantic properties
[4]. Morphological study comes here to help with rules for analyzing the structure and
formation of the words.
II. KOKBOROK MORPHEMES TO BE REPRESENTED AS
A Kokborok morpheme, besides the root word, is supposed to be represented in
the Kokborok-UNL -dictionary using the following UNL format.
[HW]{ID}”UW” (ATTRIBUTE, ATTRIBUTE,…)
<FLG, FRE, PRI>
HW
Head Word (Kokborok Word)
ID
Identification of Head Word (omitable)
UW
Universal Word
ATTRIBUTE
FLG
FRE
PRI
Attribute of the HW
Language Flag
Frequency of Head Word
Priority of Head Word
The attributes describe the nature of the head word classifying it as a grammatical,
semantic
or morphological
feature.
So,
we
will
be
specially concerned
about
representation of morphemes using various attributes. We have the following two divisions
for presenting morphological structure of Kokborok words.
A. Morphological analysis of simple Kokborok words
Traditional grammars of Kokborok recognize five parts of speech.
The
parts
of
speech classification plays a central role for explaining the structure of a sentence. So,
we have to deal with five different types of words in Kokborok sentences, i.e. noun,
adjective, pronoun, preposition, verb and in this section we concentrate on nouns, adjectives
and verbs only. Two different types of morphologies are recognized for simple Kokborok
words [4].
1) Inflectional Morphology
Inflectionalmorphology derives words from another word form acquiring certain
grammatical features but maintaining the same part of speech or category. There are a
number of inflectional suffixes indicating number of the nouns and pronouns of a
sentence. Here we can explain the following five different types of morphologies.
TABLE I. EXAMPLES OF NOUN MORPHOLOGY
Number
Root
Word as
word
appears
in a
sentence
Singular
chwla
chwlano
(boy)
(to
the
boy)
• Nouns. Kokborok has a very strong and structural inflectional morphology for its
nouns based on case. Case of noun may be nominative (“chwla”, man), accusative
(“chwla-no”, to the boy), and genitive (“chwla-ni”, of the boy), locative (“nog-o”, in/at
house) and so on. Gender and number are also important for identifying proper categories
of nouns. Number may be singular (“ri”, clothe) or plural (“rirok”, clothes) gender of nouns
can be masculine (“takhuk”, brother), feminine (“bukhuk”, sister), common (“cherai”, child)
and neuter (“swikong”, pen) we consider six different types of nouns and show possible
representation of their inflectional suffixes in the UNL format. Some examples of analysis of
nouns are shown in Table-I, based on number.
• Proper noun. The names of specific things, e.g. Khumbar (name of a person), Saidra
(name of a river) are taken here. For example “Khumbarno twi ridi” (give water to Khumbar)
here “no” is the inflectional morpheme.
[Khumbar]{}”Khumbar(iof>person)”(N,FEMALE,ANI,3SG)<B,0,0>[no]{}”no”(NMOR)<K,0,0>
• Common noun. Common names of particular things e.g. bufang (tree), aa (fish), borok
(man), wak (pig), twi (water) are considered here. Common nouns can be singular or plural.
For example “bufangrok tanwi khibikha” (trees have been cut). “bufango muisurum
kwbangma”(many ants on tree). “bufangni uklogo huijakdi”(hide behind the tree). Here “rok”,
“o” and “ni” are inflectional morphemes.
[bufang]”tree (icl>plantation)” (N)<K.0.0>
[rok]{}”rok” (NMOR)<K,0,0>
• Material noun. The names of materials for example. Rangchak (gold), watwi (rain),
“rangchakni motok” (golden crown). “watwio ta sijakdi” (don’t get wet in the rain), here “ni”
and “o” are inflectional morpheme.
[rangchak]{}”gold (icl>thing)”(N)<K,0,0>
[ni]{}”ni”(NMOR)<K,0,0>
• Abstract noun. This types of nouns represent names of quality e.g. tongthokma
(happiness), Khakhamma (sadness), wanama (tense), kirima (fear), here “ma” is inflectional
morpheme.
For example “ang rwchabwi tongthokma mano” (I get happiness from singing), here “ma” is
inflectional morpheme.
[Tongthok]{}“happiness(icl>state)”(N)<K,0,0>
[ma]{}”ma”(NMOR)<K,0,0>
• Verbal noun. These types of verbs indicate nouns. e.g. ”ang bazaro thangna nango” (I
need to go to market)
“ang khwna rwchabna nango”. Here “na” is inflectional morpheme.
[thang]{}”go(icl>do)”(V)<K,0,0>
[na]{}”na”(NMOR)<K,0,0>
• Collective noun. These types of nouns indicate the collective names of people or things.
e.g. borok (people), dopha (clan), hoda (society), “chini borokrokno sadi”(tell our people).
[borok]{}”people(icl>human)”(N)<K,0,0>
[rokno]{}”rokno”(NMOR)<K,0,0>
• Adjective morphology. Inflectional suffixes are also present in Kokborok adjectives. Some
examples are given in Table II.
TABLE II. EXAMPLES OF ADJECTIVE MORPHOLOGY
Root word
Word as
Inflection
appears in a
al prefix
sentence
sok (to rot)
kosok (rotten)
Here [ham] {}”good (iof>quality)” (N)
ko
[ka]{}”ka” (ADJMORP)
[sok]{}”rot (icl>do)” (V)
[ko]{}”ko” (ADJMORPH)
TABLE III. MORPHOLOGY OF ROOT VERB CHA (EAT) AND KHAI (DO)
Person/tense
Verb as appears
Inflectional
suffix
First
Present
chao/khaio
o
Present continuous
chawi tongo/khaiwi tongo
witongo
Past
chakha/khaikha
kha
Past continuous
chawi tongmani/khaiwi tongmani
witongmani
Past perfect
chamani/khaimani
mani
Diversity of verb morphology in Kokborok is very significant. For example if we consider ‘cha’
as root word than after adding ‘kha’ we get ‘chakha’ which means work has been done in
past. Similarly after adding ‘witongo’ means work is being done in present and by adding
’witongmani’ means work was being done in the past. We applied morphological analysis for
different person to find the actual meaning of the word. We show some data for root verbs
cha (eat) and khai (do) in Table III.
We give examples of the Kokborok word “swi” which means write. The dictionary entry
is
[swi]{}”write (icl>do)” (list of semantic and syntactic attributes) <K,0,0>, where “swi” is the
head word. Some possible transformation of “write” in the Kokborok UNL dictionary shown
below.
//For First person
[-swi]{}”write(icl>do)”(V,@present,indf)
[-o]{}”o”(VMOR,@present,indf.)
[-witongo]{}”witongo”(VMOR@present,cont.)
[-kha]{}“kha”(VMOR@past, indf.)
[-witongmani]{}”witongmani”(VMOR@past,cont.)
[-mani]{}”mani”(VMOR@past,perf.)
[-jago]{}”jago”(VMOR@fact.)
[-nai]{}”nai”(VMOR@future,indf)
//For Second person
[-di]{}”di”(VMOR@present,indf)
[-kha]{}”kha”(VMOR@past,indf)
[nai]{}”nai”(VMOR@future,indf)
//For Third person
[o]{}”o”(VMOR@present,indf)
[-witongo]{}”witongo”(VMOR@present,cont.)
[-kha]{}“kha”(VMOR@past, indf.)
[-witongmani]{}”witongmani”(VMOR@past,cont.)
[-mani]{}”mani”(VMOR@past,perf.)
[-nai]{}”nai”(VMOR@future,indf)
Such dictionary order with root word followed by the derivations will help in quick search to
find UW and the attributes of Kokborok word.
2) Derivational Morphology
This is the morphology of words similar to compound words. Words which is a
combination of two or more words belonging to different part of speech for example
“Tokkoknai” where “Tok (bird)” is Noun and “Koknai (shooter)” is also a noun, i.e
Noun+Noun=Noun
,”amingbolong(wildcat)” where “aming” is Noun and “bolong” is adjective i.e
Noun+Adjective=Noun
B. Morphological Analysis of compound Kokborok words
Compound words are those words which contain more than one root word. Different
types of compound words are given below.

Noun +Noun=Noun
“Ma-Pha” (mother and father)

Noun +Adjective=Adjective
“Bwkhnai kuphur” (hairwhite/greyhair)

Pronoun +Noun +Noun=Noun
“Chini-ha-ama” (our- mother- land)
C. Addition of Prefixes with root word in compound Kokborok words
In Kokborok, some compound words are formed from root word with the addition of
prefixes. And the prefixes changes according to the person. We have taken root word “ChwiChu” (Grand Mother and Grand Father).
III.
CONCLUSION AND FUTURE WORK
It was quite challenging work as very limited or no work has been done on Kokborok. In
this paper we have analyzed the morphology of Kokborok and outlined a dictionary
development procedure of Kokborok dictionary according to UNL format. We
implemented
some
rules
to
develop
the morphological
analysis
of
simple
have
and
compound Kokborok words that can be used to make UNL-Kokborok dictionary.
Our future plan in this regard is to

Develop morphological rules for Kokborok Root, verbal suffix and primary suffix.

Structure the dictionary entries of Kokborok morphemes for rule generation
REFERENCES
[1] Md. Nawab Yousuf Ali, S. M. Abdullah Al-Mamun, Jugal Krishna Das, Abu
Mohammad Nurannabi, “Morphological Analysis of Bangla Words for Univesal
Networking Language” , ICDIM 2008, Third International Conference in Nov “2008.
[2] Md. Firoz Mridha, Md. Nurul Huda, Chowdhury Mofizur Rahman, Jugal Krishna Das,
“Development of Morphological Rules for Bangla Root, Verbal Suffix and Primary
Suffix for Universal Networking Language” 6th International Conference on Electrical
and Computer Engineering ICECE 2010, Dhaka, Bangladesh.
[3] Md. Kamrul Hasan,”Causative Constructions in Kok-borok”, the Dhaka university
journal of linguistics: Vol.2 No.4 August 2009 page: 115-137.
[4] M.M. Asaduzzaman, M. M. Ali, ”Morphological Analysis of Bangla Words for
Automatic Machine Translation”, International Conference on Computer and
Information Technology (ICCIT), Dhaka, 2003, pp.271-276.
[5] Binoy Debbarma,”Learn Kokborok in Three Months” Language Wing, Education
Dept., TTAADC, Khumulwng, Tripura.
[6] http://en.wikipedia.org/wiki/Morphology_%28linguistics%29
[7] http://www.undl.org/
Download