CENG 463: Introduction to Natural Language Processing

advertisement
CENG 463: Introduction to Natural Language Processing
Term Paper and Project Proposal
Submitted by: Nazim YENIER
Topic of interest: Morphological Analysis, Spell Checking
Project partner: Hakan CEYLAN
Reference papers:
(1) Morphological Productivity in the Lexicon
Proc. of the 1996 ACL SIGLEX Workshop at Santa Cruz.
-- O. Sehitoglu, C. Bozsahin
(2) An Outline of Turkish Morphology
-- K. Oflazer, E. Gocmen, C. Bozsahin
We are mostly interested by morphological analysis for our project. As can be
seen by the reference papers we have chosen, and clear to every Turkish speaker,
Turkish word morphology is very complex but owns many computationally clear
rules for constructing up words. This property makes of it a very convenient target
for computational researches.
What we first need is to form a clear, flawless guide for the morphemes and rules
that are used to construct Turkish words. ‘An Outline of Turkish Morphology’ (2)
is a strong base for us to get on the way. Though this paper contains some very
small errors and incorrect derivations about Turkish morphemes, the way it uses
for describing these morphemes and the rules for uniting them is very powerful
and well designed for any computational purpose.
For our project, we plan to design an editor-like program that will get some text in
Turkish, will parse the words in the text and spell-check them. We are very eager
to go further and add to our editor a translation tool. If a word is correctly parsed
and found to be correctly spelled then the user will be able to ask for its translation
in English. The translational part of our project will only be limited to words. We
can’t and don’t plan to try to translate a whole sentence.
Sure, we need to elaborate a lexicon for our project. Our lexicon will probably be
limited to 1-2 hundred roots (verbs and nouns) since trying to prepare a large
lexicon for Turkish would take too much time and is also out of our scope. On the
other hand the lexicon will contain almost all the suffixes for constructing up
verbs and nouns. While choosing words for our lexicon, we will try to avoid all
kinds of words that are subject to create exceptions and we will not accept any
word that Turkish got in from other foreign languages.
Download