Arabic Morphology Template Grammar-based Hassanin M. Al-Barhamtoshy, Khalid O. Thabit and Basil. Ba-Aziz KAU, Faculty of Computing and Information Technology, Jeddah Abstract This research presents a multi natural language processing model to be used in machine translation and language processing systems. We will describe problems of analysis, taken into our consideration ambiguity (lexically and syntactically). Different types of linguistic and non-linguistic knowledge are necessary to resolve these problems of ambiguity, and in this research we examine in more detail how to represent this knowledge. In addition, the research describes a system for generating natural-language sentences from syntax and lexical structures, taken into our point of view an internal (or interlingual) representation. Such model will be developed as part of an Arabic– English Machine Translation (MT) system; however, it is designed to be used for many other MT language pairs and natural language applications. Consequently, the contributions of this work include building dictionary to be used in automatic translation. 1. Introduction To make a good natural language processing (NLP) in translation models, the following subsection describes different sub-models of the NLP. 1.1. Dictionary Dictionaries are the largest components of machine translation (MT: or automatic translation) systems in terms of the amount of information they hold. If they are more then simple word lists, then they may well be the most expensive components to construct [1-3]. Consequently, a user can make some additions to system dictionaries to make a system useful. One aspect point of view get an idea of the dictionary information size that may be needed for commercial purposes a lexicon with 20 000 entries is often considered as the minimum. However existing dictionary contains words - the Oxford English Dictionary contains about 250 000 entries without being exhaustive even of general usage. In a matter of fact, no dictionary can ever be complete [1, 2]. 1 1.2. Word Types It is useful to make a distinction between the characteristics of a word and its inherent properties with respect to its places (in sentence) in its grammatical environment. Each word has type with respect to its morphological analysis. Although this types include grammatical properties, like the indication of gender in some languages (the Arabic or the French part of the bilingual dictionary entry), and the indication of number on nouns. Typically, the citation form of nouns is the singular form [1-5]. 1.3. Dictionaries and Morphology Morphology means the internal structure of words, and how words can be formed. In Arabic it is usual to categorize three different word formation processes [1,4,7]: 1 Inflectional processes, by means of which a word is derived from another word form, acquiring certain grammatical features but maintaining the same part of speech or category (e.g. walk, walks); 2 Derivational processes in which a word of a different category is derived from another word or word stem by the application of some process (e.g. grammar grammatical, grammatical grammaticality); 3 Compounding, in which independent words come together in some way to form a new sentence unit, (in Arabic )شكرناهم. In Arabic, inflectional and derivational processes involve prefixes (as in )فنشككرand suffixes (as in )شككرناهم, and what is called pronouns inflection or subword. In other languages, a range of devices such as changes in the vowel patterns of words, doubling or reduplication of syllables, etc., are also found. Clearly, these prefixes and suffixes (collectively known as affixes) cannot "stand alone" as words. Compounding is quite different in that the parts can each occur as individual words. 1.4. Ambiguity Most Natural Language Processing is concerned with only one meaning. However, as we all know, this is not the case. When a word has more than one meaning, it is said to be lexically ambiguous. When a phrase or sentence can have more than one structure, it is said to be structurally ambiguous [4,5]. 2 1.5. Semantic Semantic is concerned with the meaning of words and how they combine to form sentence meanings [5]. It is useful to distinguish lexical semantics, and structural semantics- the former is to do with the meanings of words, the latter to do with the meanings of phrases, including sentences [6]. There are many ways of thinking about and representing word meanings, but one that has proved useful in the field of machine translation involves associating words with semantic features which correspond to their sense components. For example, the words man, woman, boy, and girl might be represented as [1, 5, 6]: man = (+HUMAN, +MASCULINE and +ADULT) woman = (+HUMAN, -MASCULINE and +ADULT) boy = (+HUMAN, +MASCULINE and -ADULT) girl = (+HUMAN, -MASCULINE and -ADULT) In case of designing an Arabic translation dictionary, it must be professional in linguist's translation. The following figures (1 and 2) give example as case studies for English to Arabic and French to English translation examples [6]. Fig. (1): English to Arabic simple translator Fig. (2): French to English to simple translator 2. Building Arabic Dictionary In information retrieval systems, such as CLIR, queries in one language retrieve relevant documents in other languages Machine-Readable Dictionary (MRD) and Machine Translation (MT) are important resources for query translation in CLIR[8]. Mohammed Aljlay and et al investigate MT and MRD to Arabic-English CLIR. The 3 translation ambiguity associated with these resources is the key problem. They present three methods of query translation using a bilingual dictionary for Arabic-English CLIR [8]. Out of vocabulary (OOV) words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to transliterate the unknown words, that is, to render them in the orthography of the second language. In the present study, research of [9] presents a simple statistical technique to train English to Arabic transliteration model from pairs of names. Arabic requires good stemming for effective information retrieval due to highly inflected in derivations, yet no standard approach to stemming has emerged [10-13]. Several light stemmers is developed based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. The retrieval effectiveness of such stemmers are compared with morphological analyzer on the TREC-2001 data [10]. The inflectional structure of word affects the retrieval accuracy of information retrieval systems of Latin-based languages. Different stemming algorithms for Arabic information retrieval systems are presented [11-18]. The effectiveness of surfacebased retrieval is also investigated. This approach degrades retrieval precision since Arabic is a highly inflected language. Therefore, root-based retrieval model is proposed [11]. Also, a statistically significant improvement over the surface-based approach noticed. Arabic inflectional morphology requires infixation, prefixation and suffixation, giving rise to a large space of morphological variation [12]. In this project an approach is described to reducing the complexity of Arabic morphology generation using grammar-based rules. By decoupling the problem of stem changes from that of prefixes and suffixes, significant reduction is gained in addition to the number of rules required, as much as a factor of three for certain verb types [18]. Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training stories have been provided in English. In tracking, non-English test stories are then machine translated into English to compare them with the topic models. A native language hypothesis proposed stating that comparisons would be more effective in the original language of the story [21]. 4 Due to the high number of inflectional variations of Arabic words, empirical results suggest that stemming is essential for Arabic information retrieval. However, current light stemming algorithms do not extract the correct stem of irregular (so-called broken) plurals, which constitute ~10% of Arabic texts and ~41% of plurals. Although light stemming in particular has led to improvements in information retrieval [22]. There have been advances in Cross-Language Information Retrieval (CLIR) in recent years. One of the major remaining reasons that CLIR does not perform as well as monolingual retrieval is the presence of out of vocabulary (OOV) terms. Previous work either has relied on manual intervention or has only been partially successful in solving this problem. Method is used to extend earlier work in this area by augmenting this with statistical analysis, and corpus-based translation [23]. In another paper, a system that recognizes place names in natural language text is described to produce geographic maps and animations showing the geographical coverage of texts about a certain subject as it changes over time. As the system is built to analyze texts in many different languages, it restricts the usage of linguistic analysis tools to the minimum. Instead, it relies on a gazetteer (geo dictionary) containing place names in different languages and uses heuristics for disambiguation purposes [24]. A methodology for implementing natural language morphology in the functional language Haskell introduced in [25]. The main idea behind is simple as stated in [25], instead of working with un-typed regular expressions, which is the state of the art of morphology in computational linguistics, finite functions and algebraic data types are used. The definitions of these data types and functions are the language-dependent part of the morphology. For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between Japanese and English. In the article of [26], it describes a method for automatically creating and validating candidate Japanese transliterated terms of English words. A phonetic English dictionary and a set of probabilistic mapping rules are used [26]. As participants in the TIDES Surprise language exercise, researchers at the University of Massachusetts helped collect Hindi-English resources and developed a cross5 language information retrieval system. Components included normalization, stopword removal, transliteration, structured query translation, and language-modeling using a probabilistic dictionary derived from a parallel corpus. Existing technology was successfully applied to Hindi [27]. A novel two-step fuzzy translation technique is presented for cross-lingual spelling variants. In the first stage, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second stage, the intermediate forms obtained in the first stage are translated into a target language using fuzzy matching [28]. While many investigations have explored the use of query expansion techniques to combat errors induced by translation, no study has yet examined the effectiveness of these techniques across resources of varying quality. This paper presents results using parallel corpora and bilingual wordlists that have been deliberately degraded prior to query [29]. A cross-lingual, question-answering (CLQA) system for Hindi and English are developed [30]. It accepts questions in English, finds candidate answers in Hindi newspapers, and translates the answer candidates into English along with the context surrounding each answer. The system was developed as part of the surprise language exercise (SLE) within the TIDES program [30]. 3. Proposed Model System Structure The proposed model includes the following rules: Step 1: The Arabic words are looked up in an Arabic electronic dictionary, and then employees the morphological component that contains specific rules that deal with the regularities of inflection. The appropriate category (for example: noun or verb or special character) is assigned. Step 2: Some rules of an Arabic grammar are used to try to parse the entire words. Therefore, an advanced parser might work out that it is in fact a measure modifier. However, it is quite possible that the parser parses the entire word to find out its components (extract its implicit pronouns from affixes). This is 6 because the difference between the Arabic and some possible English translations is not great. Step 3: The Engine now applies source to target language (Arabic to English) transformation rules. The first step here is to find translations of the Arabic words in an Arabic to English dictionary. We can now summarize some of the distinctive design features of this engine: Input sentences are automatically parsed only as it is necessary for the successful operation using various morphological and lexical rules (structuredbased) and phrasal transformation rules. The transformer engine is often content to find out just a few incomplete pieces of information about the structure of some of the phrases in a sentence, and where the main verb might be. Morphological rules employed firstly, within all the possibilities of derivation rules for all the words inside sentences. In practice, transformer model takes some of analyzed features and then translate it into the target features. Thus in the Arabic to English transformer system, we assumed that the grammar covered only some features of Arabic. Syntactic rules takes such analyzed features in added to the extracted features, and therefore find the syntactic form of the sentence (surface representation). The Lexical rules are done to find out if there are meaning of such representation or not? The use of limited grammars and incomplete parsing means that transformer systems do not generally construct complex representations of input sentencesin many cases, not even the simplest surface constituent structure tree. Most of the engine's translational competence lies in the rules which transform bits of input sentence into bits of output sentence, including the bilingual dictionary rules. In a sense a transformer system has some knowledge of the comparative grammar of the two languages-of what makes the one structurally different from the other. The proposed model is based on bilingual dictionary. Therefore, we'll try to create a new dictionary based on the philosophy of Word.Net dictionary [31]. Consequently, reports on the design and model implementation will be illustrated and executed based 7 on bilingual Arabic/English dictionary. In a matter of fact, a relational database may be employed to store the syntactic and lexical indicators and conceptual relations. 3.1. Model Activity Diagram As described in many literatures, activity diagram shows the flow of control, using rounded rectangles. Figure 3 shows flow of control for the Find Root for a verb. All transitions between activities are represented by an arrow. Horizontal bars are used to simulate activities performed parallel. The model is based on Arabic template dictionaries (Arabic verb types, Roots and template patterns). Consequently, each rule will be illustrated according to the relational database dictionary. Start Read Arabic Word / Root Arabic Verb Type Arabic Verb Roots Templates Match Template Find Root for Arabic Verb Generate Vowlized Arabic Word Extract all derivative for root Extract Root,Root Type, Attributes End Fig. 3 : Find Arabic Root Activity Diagram 3.2. Generate Non Diacritic Arabic Word This function is to generate a non diacritic Arabic word from an input which is the Arabic root and the template for the word the following example can explain more Parameter Arabic Root Template Value شرب بون3 2 1 ي 8 As described later in 4 about the vowel and letter mask as following Table Letter Present 1 Present First Letter 2 Present Second Letter 3 Present Third Letter Any Arabic Letter Same Arabic Letter Symbol أبتثجحخدذرزسشصضطظعغفقكلمنهوي Extended Arabic letters إأآؤءئ The output will be Output Value يشربون Generate Non Diacritic Arabic word 3.3. Generate Diacritic Arabic Word This function is to generate a non diacritic Arabic word from an input which is the Arabic root and the template for the word the following example can explain more Parameter Value Arabic Root شرب Template يQ1 Q2 Q3 X As described later in 4.6 about the vowel and letter mask as following Table Letter Present Symbol 1 Present First Letter 2 Present Second Letter 3 Present Third Letter Q Fatha َ X Skoon َ Any Arabic Letter Same Arabic Letter أبتثجحخدذرزسشصضطظعغفقكلمنهوي Extended Arabic letters إأآؤءئ The output will be Output Value Generate Diacritic Arabic word يشرب 3.3. Extract the Root of Arabic word Extracting the root of Arabic using a little bit complex Algorithm which is using multiple functions and multiple mask, and in beginning the function should find the 9 Matched Templates to the input word. After that we remove all the non required characters and keep the original verb characters, the output will be all matched Roots. 3.4. Generate all possible derivative pattern Generating all possible derivative pattern uses different functions, at beginning we find the Type for the input root verb, and what kind of templates that applied to this verb, as example: verb TypeID Present RealRoot وفى 29 2 َوفَى شرب 1 1 َب َ ش ََر شرب 1 5 َب َ ش َِر As we see in the table the verb وفىthe verb type is 29 and the Present Type is 2 and which is matching only one table as following: 4. Arabic Template Rules of the Proposed Model The three operations of affixes (prefixes, infixes and suffixes) can be used to extract the roots from Arabic words using derivations templates. Also, the derived Arabic Words can be derived from Arabic roots after applying the three affixes templates. The input Arabic word is employed with the second input (affixes templates: called Mask) to find out the Arabic root, as shown in Figure 4. 10 Arabic Word Morphological Rules : AND, OR, XOR Arabic Root + Indicators Mask : Affixes Templates Fig. 4: Arabic Template Mask 4.1. Unsetting Rules One of the rules may use AND operator, others may use OR operator or XOR operators to do so, use an unsetting mask with the same character length. Consequently, such rules for extracting root can be summarized as follows: 1. To unset a character in input Arabic word, use 0 fore the corresponding character in the mask. 2. To leave a character in the input Arabic word unchanged, use 1 for the corresponding character in the mask. 3.Use the AND/OR/XOR operators to extract the Arabic root and additional indicators. To understand how these rules work, refer to figure 5 as an example I/P Word Root : = Output Prefix = Suffix = Infix = - AND operator 0 0 1 1 1 0 I/P Mask Template Fig. (5): Example of Arabic Template Mask for ()يشكرون 4.2. Setting Rules This rule is employed to find out another derivation of words after the first rule (unsetting rules) or sole. Therefore use a setting mask with the same manner except the OR operator is used instead of AND. The setting rules algorithm can be summarized as the following steps: 11 1. To set a character in the input word, use 1 for corresponding character in the mask. 2. To leave a character in the input word unchanged, use 0 for the corresponding character in the mask. To simplify those rules, refer to the characteristic of the OR operator as shown in figure (6) and assume that the input Arabic word is ()شككر. The mask should have stream of alternatives to find out all the possible derivations from the word ()شكر. I/P Word Derivation s O/P : e.g. Output … ... OR operator …. ... Suffixes Mask 0 0 0 Prefixes Mask … Infixes Mask Adding Masking Rules Fig.(6): Example of Arabic Template Mask for ( )شكرto find out its Arabic derivations 5. Results and Discussion There is a triliteral, quadrilateral, or pentaliteral Arabic verbs. Every Arabic verb has its own derivatives and these derivatives are depend on its type. About 30 types of the triliteral verbs contain 5321 verbs. This can produce 20000 templates. If affixes rules are applied for these templates (4 prefixes and 30 suffixes), therefore the total number of Arabic word derived from verbs are 28,140,005 derivations. 5.1 Testing Arabic artificial words were used in testing the proposed model. Such words include all their various possible derived verbs, nouns, adjectives, adverbs, etc and various 12 combinations of using affixes (prefixes, suffixes, infixes, and connected pronouns). The testing sample included 50 roots and their derivations. The results of this experiment are presented in table (1). The sample was composed of 60% (30 roots) of which was derived from sound verbs and 40% (20 roots) belonged to weak verbs. Table (1): Results of Testing the Proposed Model Total number of hits Correct Ratio Error ratio Sound Verbs 30 100 % 0.00 % Weak Verbs 20 98 % 0.02 % Total 50 99 % 0.01 % The testing is used to find out: (1) Roots of entire Arabic words (figures 7-a,b and c). (2) Morphological analysis of the entire Arabic words with associated analyzed properties, (figures (8-a, and b) (3) Possible diactrize of the entire Arabic words (figure 9). Fig. (7-a): Example to find Root of the Arabic word ()فسيكفيكهما 13 Fig. (7-b): Example to find Root of the Arabic word ()المهزوم Fig. (7-c): Example to find Root of the Arabic word ()ق 14 Fig. (8-a): Example to find Properties of the Arabic word ()فسيكفيكهم Fig. (8-b): Example to find Properties of the Arabic word ()ضارب Fig. (9): Example to find Properties of the Arabic word ()ضارب 15 5.2 Complexity Due to proposed model complexity, so turn our attention to how morphological analysis is conducted by the proposed model, we find that the running time cost is determined by three component of the following algorithm: Step 1: Checking the existence of the entire Arabic word and order of root using the proposed Arabic dictionary. Step 2: Validating prefixes and suffixes of the entire Arabic word using the proposed template Arabic grammar. Step 3: Validating infixes of the entire Arabic word – if needed. Therefore, for the first step “Checking the existence of the entire Arabic word and order of root using the proposed Arabic dictionary”, the comparison is carried out character by character, i.e.; we should assume that the number of comparisons would be: T1 = n Where n is length of the entire Arabic word (n=3 for trilateral or 4 for quadrilateral, or 5 for pentaliteral). At the second step, if the entire Arabic word exists in a proper sequence after validating prefixes and suffixes, such that are checked against a list of stored prefixes and suffixes, the number of comparisons determined as follows: T2 = Log Nps Where, Nps is the number of prefixes and suffixes. The validation of word infixes depends on two factors [32]: the size of difference between positions of the letters of root in the entire word, and the list of infix letters to be checked. Accordingly, the number of comparisons would be calculated as follows: T3 = D + M Where, D is the number of comparisons for checking the difference, M is the number of character comparisons to match an infix against the set infixes. Consequently, the overall running time for our proposed model is computed as the sum of the three factors listed above. T = T 1 + T2 + T 3 = n + (Log Nps) + (D + M) 16 References [1] Doug Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys, and Louisa Sadler, MACHINE TRANSLATION: An Introductory Guide, 1995. [2] W.J. Hutchins and H.L. Somers. An Introduction to Machine Translation. Academic Press, London, 1992. [3] http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/ [4] Hassanin M. Al-Barhamtoshy, Understanding of Arabic Text, Ph. D. dissertation, Al-Azhar University, 1992. [5] Ronnie Cann. Formal Semantics. Cambridge University Press, Cambridge, 1993. [6] http://www.worldlingo.com/products_services/worldlingo_translator.html [7] Ashraf I Madkour and Hassanin M. Al-Barhamtoshy, Arabic Morphological Analyzer, Al-Azhar Engineering International Conference, AEIC 1993, CairoEgypt. [8] Mohammed Aljlayl, Ophir Frieder, Corpus Linguistics: Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation, Proceedings of the tenth international conference on Information and knowledge management, October 2001 . [9] Nasreen AbdulJaleel, Leah S. Larkey, Information retrieval session 3: cross language retrieval: Statistical transliteration for english-arabic cross language information retrieval, Proceedings of the twelfth international conference on Information and knowledge management, November 2003. [10] Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell, Arabic Information Retrieval: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 2002. [11] Mohammed Aljlayl, Ophir Frieder, Information retrieval 1: On arabic search: improving the retrieval effectiveness via a light stemming approach, Proceedings of the eleventh international conference on Information and knowledge management, November 2002. [12] M. A. madkour, A. Al-samahy and Hassanin M. Al-Barhamtoshy, “An Arabic Morphological Analyzer”, Al Azhar Engineering International Confrence, AEIC 1991, Cairo, December 1991. [13] N. H. Hegazi, and A. A. Elsharkawi. "An Approach to a Computerized Lexical Analyzer for Natural Arabic Text". Proceedings of the Arabic Language conference, Kuwait,1985. [14] M. Geith, T. El-Sadany. "Arabic Morphological Analyzer on a Personal Computer". Proceedings of the First KSU Symposium on Computer Arabization.1987. [15] S. S. Al-Fadaghi and F. S. Al-Anzi.” A new algorithm to generate Root-pattern Forms”. Proceedings of the 11th National Computer Conference, KFUPM, P.391. 1989. [16] Y. Hilal “Morphological Analysis of Arabic Morphology", Computer Processing of the Arabic Language,Workshop Papers, vol. I, April, Kuwait.1985 [17] Botrous Thalouth and Abdullah Al-Dannan. “ A Comprehensive Arabic Morphological Analyzer /Generator”. IBM Kuwait Scientific Center. Feb. 1987. [18] Imad A. Al-Sughaiyer and Ibrahim A Al-Kharashi “Arabic Morphological Analysis Techniques: A Comprehensive Survey”, CERI internal report, KACST 2003. 17 مدينة الملك عبد العزيز،] مشروع بناء القاعدة الصرفية لمفردات اللغة العربية باستخدام الذخيرة اللغوية20[ .هـ1424/4/3 ، معهد بحوث الحاسب واإللكترونيات،للعلوم والتقنية [20]Violetta Cavalli-Sforza, Abdelhadi Soudi, Teruko Mitamura , Arabic morphology generation using a concatenative strategy, Proceedings of the first conference on North American chapter of the Association for Computational Linguistics, April 2000. [21] Leah S. Larkey, Fangfang Feng, Margaret Connell, Victor Lavrenko, Machine learning for IR: Language-specific models in multilingual topic tracking, Proceedings of the 27th annual international conference on Research and development in information retrieval, July 2004. [22] Abduelbaset Goweder, Massimo Poesio, Anne De Roeck, Posters: Broken plural detection for arabic information retrieval, Proceedings of the 27th annual international conference on Research and development in information retrieval, July 2004. [23] Ying Zhang, Phil Vines, Cross-language information retrieval: Using the web for automated translation extraction in cross-language information retrieval, Proceedings of the 27th annual international conference on Research and development in information retrieval, July 2004. [24] Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Tom De Groeve, Information access and retrieval (IAR): Geographical information recognition and visualization in texts written in various languages, Proceedings of the 2004 ACM symposium on Applied computing, March 2004. [25] Markus Forsberg, Aarne Ranta, Functional morphology, ACM SIGPLAN Notices , Proceedings of the ninth ACM SIGPLAN international conference on Functional programming, Volume 39 Issue 9, September 2004 . [26] Yan Qu, Gregory Grefenstette, David A. Evans, Cross-lingual information retrieval: Automatic transliteration for Japanese-to-English text retrieval, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 2003. [27] Leah S. Larkey, Margaret E. Connell, Nasreen Abduljaleel, Hindi CLIR in thirty days, ACM Transactions on Asian Language Information Processing (TALIP), Volume 2 Issue 2, June 2003. [28] Ari Pirkola, Jarmo Toivonen, Heikki Keskustalo, Kari Visala, Kalervo Järvelin, Cross-lingual information retrieval: Fuzzy translation of cross-lingual spelling variants, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, July 2003. [29] Paul McNamee, James Mayfield, Cross-language Information Retrieval: Comparing cross-language query expansion techniques by degrading translation resources, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 2002. [30] Satoshi Sekine, Ralph Grishman, Hindi-english cross-lingual question-answering system, ACM Transactions on Asian Language Information Processing (TALIP), Volume 2 Issue 3, September 2003. [31] William J. Black and Sabri El-Kateb, A Prototype English-Arabic Dictionary based on Word Net, UMIST, Department of Computation, Manchester, M60 1QD, UK, Piek Vossen (Eds): GWC 2004, Proceedings, pp. 67-74. [32] Suleiman H. Mustafa (2003), A Morphology-driven string matching approach to Arabic text searching, the Journal of Systems and Software 67 (2003) 77-87. 18 Hassanin M. Al-Barhamtoshy is a professor of computer science in the Department of Information Technology at King Abdulaziz University (Jeddah, Saudi Arabia). He earned his Ph.D. in computers and systems engineering from the University of Al-Azhar (Egypt) in 1992. He was granted several academic awards and scholarships. After graduation, he worked at AlAzhar University and chaired many external projects for this four years. In 1996 he went on leave from Al-Azhar for six years during which he worked in the Department of Computer Science at King AbdulAziz University, Faculty of Science. He is at present a full professor at KAU, Faculty of Computing and Information Technology. He has published several papers in a number of research areas in computer science and computer engineering including natural language processing (especially Arabization of computers), database and information retrieval systems, software engineering and artificial intelligence. 19