CENG 463: Introduction to Natural Language Processing Term Paper and Project Proposal Submitted by: Nazim YENIER Topic of interest: Morphological Analysis, Spell Checking Project partner: Hakan CEYLAN Reference papers: (1) Morphological Productivity in the Lexicon Proc. of the 1996 ACL SIGLEX Workshop at Santa Cruz. -- O. Sehitoglu, C. Bozsahin (2) An Outline of Turkish Morphology -- K. Oflazer, E. Gocmen, C. Bozsahin We are mostly interested by morphological analysis for our project. As can be seen by the reference papers we have chosen, and clear to every Turkish speaker, Turkish word morphology is very complex but owns many computationally clear rules for constructing up words. This property makes of it a very convenient target for computational researches. What we first need is to form a clear, flawless guide for the morphemes and rules that are used to construct Turkish words. ‘An Outline of Turkish Morphology’ (2) is a strong base for us to get on the way. Though this paper contains some very small errors and incorrect derivations about Turkish morphemes, the way it uses for describing these morphemes and the rules for uniting them is very powerful and well designed for any computational purpose. For our project, we plan to design an editor-like program that will get some text in Turkish, will parse the words in the text and spell-check them. We are very eager to go further and add to our editor a translation tool. If a word is correctly parsed and found to be correctly spelled then the user will be able to ask for its translation in English. The translational part of our project will only be limited to words. We can’t and don’t plan to try to translate a whole sentence. Sure, we need to elaborate a lexicon for our project. Our lexicon will probably be limited to 1-2 hundred roots (verbs and nouns) since trying to prepare a large lexicon for Turkish would take too much time and is also out of our scope. On the other hand the lexicon will contain almost all the suffixes for constructing up verbs and nouns. While choosing words for our lexicon, we will try to avoid all kinds of words that are subject to create exceptions and we will not accept any word that Turkish got in from other foreign languages.