1.3. Dictionaries and Morphology

advertisement
Arabic Morphology Template Grammar-based
Hassanin M. Al-Barhamtoshy, Khalid O. Thabit and Basil. Ba-Aziz
KAU, Faculty of Computing and Information Technology, Jeddah
Abstract
This research presents a multi natural language processing model to be used in
machine translation and language processing systems. We will describe problems of
analysis, taken into our consideration ambiguity (lexically and syntactically).
Different types of linguistic and non-linguistic knowledge are necessary to resolve
these problems of ambiguity, and in this research we examine in more detail how to
represent this knowledge.
In addition, the research describes a system for generating natural-language sentences
from syntax and lexical structures, taken into our point of view an internal (or
interlingual) representation. Such model will be developed as part of an Arabic–
English Machine Translation (MT) system; however, it is designed to be used for
many other MT language pairs and natural language applications.
Consequently, the contributions of this work include building dictionary to be used in
automatic translation.
1. Introduction
To make a good natural language processing (NLP) in translation models, the
following subsection describes different sub-models of the NLP.
1.1. Dictionary
Dictionaries are the largest components of machine translation (MT: or automatic
translation) systems in terms of the amount of information they hold. If they are more
then simple word lists, then they may well be the most expensive components to
construct [1-3]. Consequently, a user can make some additions to system dictionaries
to make a system useful.
One aspect point of view get an idea of the dictionary information size that may be
needed for commercial purposes a lexicon with 20 000 entries is often considered as
the minimum. However existing dictionary contains words - the Oxford English
Dictionary contains about 250 000 entries without being exhaustive even of general
usage. In a matter of fact, no dictionary can ever be complete [1, 2].
1
1.2. Word Types
It is useful to make a distinction between the characteristics of a word and its inherent
properties with respect to its places (in sentence) in its grammatical environment.
Each word has type with respect to its morphological analysis. Although this types
include grammatical properties, like the indication of gender in some languages (the
Arabic or the French part of the bilingual dictionary entry), and the indication of
number on nouns. Typically, the citation form of nouns is the singular form [1-5].
1.3. Dictionaries and Morphology
Morphology means the internal structure of words, and how words can be formed. In
Arabic it is usual to categorize three different word formation processes [1,4,7]:
1 Inflectional processes, by means of which a word is derived from another word
form, acquiring certain grammatical features but maintaining the same part of
speech or category (e.g. walk, walks);
2 Derivational processes in which a word of a different category is derived from
another word or word stem by the application of some process (e.g. grammar
grammatical, grammatical
grammaticality);
3 Compounding, in which independent words come together in some way to form a
new sentence unit, (in Arabic ‫)شكرناهم‬.
In Arabic, inflectional and derivational processes involve prefixes (as in ‫ )فنشككر‬and
suffixes (as in ‫)شككرناهم‬, and what is called pronouns inflection or subword. In other
languages, a range of devices such as changes in the vowel patterns of words,
doubling or reduplication of syllables, etc., are also found. Clearly, these prefixes and
suffixes (collectively known as affixes) cannot "stand alone" as words. Compounding
is quite different in that the parts can each occur as individual words.
1.4. Ambiguity
Most Natural Language Processing is concerned with only one meaning. However, as
we all know, this is not the case. When a word has more than one meaning, it is said
to be lexically ambiguous. When a phrase or sentence can have more than one
structure, it is said to be structurally ambiguous [4,5].
2
1.5. Semantic
Semantic is concerned with the meaning of words and how they combine to form
sentence meanings [5]. It is useful to distinguish lexical semantics, and structural
semantics- the former is to do with the meanings of words, the latter to do with the
meanings of phrases, including sentences [6].
There are many ways of thinking about and representing word meanings, but one that
has proved useful in the field of machine translation involves associating words with
semantic features which correspond to their sense components. For example, the
words man, woman, boy, and girl might be represented as [1, 5, 6]:
man = (+HUMAN, +MASCULINE and +ADULT)
woman = (+HUMAN, -MASCULINE and +ADULT)
boy = (+HUMAN, +MASCULINE and -ADULT)
girl = (+HUMAN, -MASCULINE and -ADULT)
In case of designing an Arabic translation dictionary, it must be professional in
linguist's translation. The following figures (1 and 2) give example as case studies for
English to Arabic and French to English translation examples [6].
Fig. (1): English to Arabic simple translator
Fig. (2): French to English to simple translator
2. Building Arabic Dictionary
In information retrieval systems, such as CLIR, queries in one language retrieve
relevant documents in other languages Machine-Readable Dictionary (MRD) and
Machine Translation (MT) are important resources for query translation in CLIR[8].
Mohammed Aljlay and et al investigate MT and MRD to Arabic-English CLIR. The
3
translation ambiguity associated with these resources is the key problem. They present
three methods of query translation using a bilingual dictionary for Arabic-English
CLIR [8].
Out of vocabulary (OOV) words are problematic for cross language information
retrieval. One way to deal with OOV words when the two languages have different
alphabets, is to transliterate the unknown words, that is, to render them in the
orthography of the second language. In the present study, research of [9] presents a
simple statistical technique to train English to Arabic transliteration model from pairs
of names.
Arabic requires good stemming for effective information retrieval due to highly
inflected in derivations, yet no standard approach to stemming has emerged [10-13].
Several light stemmers is developed based on heuristics and a statistical stemmer
based on co-occurrence for Arabic retrieval. The retrieval effectiveness of such
stemmers are compared with morphological analyzer on the TREC-2001 data [10].
The inflectional structure of word affects the retrieval accuracy of information
retrieval systems of Latin-based languages. Different stemming algorithms for Arabic
information retrieval systems are presented [11-18]. The effectiveness of surfacebased retrieval is also investigated. This approach degrades retrieval precision since
Arabic is a highly inflected language. Therefore, root-based retrieval model is
proposed [11]. Also, a statistically significant improvement over the surface-based
approach noticed.
Arabic inflectional morphology requires infixation, prefixation and suffixation, giving
rise to a large space of morphological variation [12]. In this project an approach is
described to reducing the complexity of Arabic morphology generation using
grammar-based rules. By decoupling the problem of stem changes from that of
prefixes and suffixes, significant reduction is gained in addition to the number of rules
required, as much as a factor of three for certain verb types [18].
Topic tracking is complicated when the stories in the stream occur in multiple
languages. Typically, researchers have trained only English topic models because the
training stories have been provided in English. In tracking, non-English test stories are
then machine translated into English to compare them with the topic models. A native
language hypothesis proposed stating that comparisons would be more effective in the
original language of the story [21].
4
Due to the high number of inflectional variations of Arabic words, empirical results
suggest that stemming is essential for Arabic information retrieval. However, current
light stemming algorithms do not extract the correct stem of irregular (so-called
broken) plurals, which constitute ~10% of Arabic texts and ~41% of plurals.
Although light stemming in particular has led to improvements in information
retrieval [22].
There have been advances in Cross-Language Information Retrieval (CLIR) in recent
years. One of the major remaining reasons that CLIR does not perform as well as
monolingual retrieval is the presence of out of vocabulary (OOV) terms. Previous
work either has relied on manual intervention or has only been partially successful in
solving this problem. Method is used to extend earlier work in this area by
augmenting this with statistical analysis, and corpus-based translation [23].
In another paper, a system that recognizes place names in natural language text is
described to produce geographic maps and animations showing the geographical
coverage of texts about a certain subject as it changes over time. As the system is built
to analyze texts in many different languages, it restricts the usage of linguistic
analysis tools to the minimum. Instead, it relies on a gazetteer (geo dictionary)
containing place names in different languages and uses heuristics for disambiguation
purposes [24].
A methodology for implementing natural language morphology in the functional
language Haskell introduced in [25]. The main idea behind is simple as stated in [25],
instead of working with un-typed regular expressions, which is the state of the art of
morphology in computational linguistics, finite functions and algebraic data types are
used. The definitions of these data types and functions are the language-dependent
part of the morphology.
For cross language information retrieval (CLIR) based on bilingual translation
dictionaries, good performance depends upon lexical coverage in the dictionary. This
is especially true for languages possessing few inter-language cognates, such as
between Japanese and English. In the article of [26], it describes a method for
automatically creating and validating candidate Japanese transliterated terms of
English words. A phonetic English dictionary and a set of probabilistic mapping rules
are used [26].
As participants in the TIDES Surprise language exercise, researchers at the University
of Massachusetts helped collect Hindi-English resources and developed a cross5
language information retrieval system. Components included normalization, stopword removal, transliteration, structured query translation, and language-modeling
using a probabilistic dictionary derived from a parallel corpus. Existing technology
was successfully applied to Hindi [27].
A novel two-step fuzzy translation technique is presented for cross-lingual spelling
variants. In the first stage, transformation rules are applied to source words to render
them more similar to their target language equivalents. The rules are generated
automatically using translation dictionaries as source data. In the second stage, the
intermediate forms obtained in the first stage are translated into a target language
using fuzzy matching [28].
While many investigations have explored the use of query expansion techniques to
combat errors induced by translation, no study has yet examined the effectiveness of
these techniques across resources of varying quality. This paper presents results using
parallel corpora and bilingual wordlists that have been deliberately degraded prior to
query [29].
A cross-lingual, question-answering (CLQA) system for Hindi and English are
developed [30]. It accepts questions in English, finds candidate answers in Hindi
newspapers, and translates the answer candidates into English along with the context
surrounding each answer. The system was developed as part of the surprise language
exercise (SLE) within the TIDES program [30].
3. Proposed Model System Structure
The proposed model includes the following rules:
Step 1: The Arabic words are looked up in an Arabic electronic dictionary, and then
employees the morphological component that contains specific rules that deal
with the regularities of inflection. The appropriate category (for example:
noun or verb or special character) is assigned.
Step 2: Some rules of an Arabic grammar are used to try to parse the entire words.
Therefore, an advanced parser might work out that it is in fact a measure
modifier. However, it is quite possible that the parser parses the entire word to
find out its components (extract its implicit pronouns from affixes). This is
6
because the difference between the Arabic and some possible English
translations is not great.
Step 3: The Engine now applies source to target language (Arabic to English)
transformation rules. The first step here is to find translations of the Arabic
words in an Arabic to English dictionary.
We can now summarize some of the distinctive design features of this engine:

Input sentences are automatically parsed only as it is necessary for the
successful operation using various morphological and lexical rules (structuredbased) and phrasal transformation rules. The transformer engine is often content
to find out just a few incomplete pieces of information about the structure of
some of the phrases in a sentence, and where the main verb might be.

Morphological rules employed firstly, within all the possibilities of derivation
rules for all the words inside sentences. In practice, transformer model takes
some of analyzed features and then translate it into the target features. Thus in
the Arabic to English transformer system, we assumed that the grammar covered
only some features of Arabic.

Syntactic rules takes such analyzed features in added to the extracted features,
and therefore find the syntactic form of the sentence (surface representation).

The Lexical rules are done to find out if there are meaning of such
representation or not?

The use of limited grammars and incomplete parsing means that transformer
systems do not generally construct complex representations of input sentencesin many cases, not even the simplest surface constituent structure tree.

Most of the engine's translational competence lies in the rules which transform
bits of input sentence into bits of output sentence, including the bilingual
dictionary rules. In a sense a transformer system has some knowledge of the
comparative grammar of the two languages-of what makes the one structurally
different from the other.
The proposed model is based on bilingual dictionary. Therefore, we'll try to create a
new dictionary based on the philosophy of Word.Net dictionary [31]. Consequently,
reports on the design and model implementation will be illustrated and executed based
7
on bilingual Arabic/English dictionary. In a matter of fact, a relational database may
be employed to store the syntactic and lexical indicators and conceptual relations.
3.1. Model Activity Diagram
As described in many literatures, activity diagram shows the flow of control,
using rounded rectangles. Figure 3 shows flow of control for the Find Root for a verb.
All transitions between activities are represented by an arrow. Horizontal bars are
used to simulate activities performed parallel.
The model is based on Arabic template dictionaries (Arabic verb types, Roots and
template patterns). Consequently, each rule will be illustrated according to the
relational database dictionary.
Start
Read Arabic
Word / Root
Arabic Verb Type
Arabic Verb Roots
Templates
Match Template
Find Root for Arabic Verb
Generate Vowlized Arabic
Word
Extract all derivative for
root
Extract Root,Root Type, Attributes
End
Fig. 3 : Find Arabic Root Activity Diagram
3.2. Generate Non Diacritic Arabic Word
This function is to generate a non diacritic Arabic word from an input which is the
Arabic root and the template for the word the following example can explain more
Parameter
Arabic Root
Template
Value
‫شرب‬
‫ بون‬3 2 1 ‫ي‬
8
As described later in 4 about the vowel and letter mask as following Table
Letter
Present
1
Present First Letter
2
Present Second Letter
3
Present Third Letter
Any Arabic Letter
Same Arabic Letter
Symbol
‫أبتثجحخدذرزسشصضطظعغفقكلمنهوي‬
Extended Arabic letters
‫إأآؤءئ‬
The output will be
Output
Value
‫يشربون‬
Generate Non Diacritic Arabic word
3.3. Generate Diacritic Arabic Word
This function is to generate a non diacritic Arabic word from an input which is the
Arabic root and the template for the word the following example can explain more
Parameter
Value
Arabic Root
‫شرب‬
Template
‫ي‬Q1 Q2 Q3 X
As described later in 4.6 about the vowel and letter mask as following Table
Letter
Present
Symbol
1
Present First Letter
2
Present Second Letter
3
Present Third Letter
Q
Fatha
َ
X
Skoon
َ
Any Arabic Letter
Same Arabic Letter
‫أبتثجحخدذرزسشصضطظعغفقكلمنهوي‬
Extended Arabic letters
‫إأآؤءئ‬
The output will be
Output
Value
Generate Diacritic Arabic word
‫يشرب‬
3.3. Extract the Root of Arabic word
Extracting the root of Arabic using a little bit complex Algorithm which is using
multiple functions and multiple mask, and in beginning the function should find the
9
Matched Templates to the input word. After that we remove all the non required
characters and keep the original verb characters, the output will be all matched Roots.
3.4. Generate all possible derivative pattern
Generating all possible derivative pattern uses different functions, at beginning we
find the Type for the input root verb, and what kind of templates that applied to this
verb, as example:
verb TypeID Present RealRoot
‫وفى‬
29
2
‫َوفَى‬
‫شرب‬
1
1
َ‫ب‬
َ ‫ش ََر‬
‫شرب‬
1
5
َ‫ب‬
َ ‫ش َِر‬
As we see in the table the verb ‫ وفى‬the verb type is 29 and the Present Type is 2 and
which is matching only one table as following:
4. Arabic Template Rules of the Proposed Model
The three operations of affixes (prefixes, infixes and suffixes) can be used to extract
the roots from Arabic words using derivations templates.
Also, the derived Arabic Words can be derived from Arabic roots after applying the
three affixes templates.
The input Arabic word is employed with the second input (affixes templates: called
Mask) to find out the Arabic root, as shown in Figure 4.
10
Arabic Word
Morphological Rules : AND,
OR, XOR
Arabic Root
+ Indicators
Mask : Affixes
Templates
Fig. 4: Arabic Template Mask
4.1. Unsetting Rules
One of the rules may use AND operator, others may use OR operator or XOR
operators to do so, use an unsetting mask with the same character length.
Consequently, such rules for extracting root can be summarized as follows:
1. To unset a character in input Arabic word, use 0 fore the corresponding
character in the mask.
2. To leave a character in the input Arabic word unchanged, use 1 for the
corresponding character in the mask.
3.Use the AND/OR/XOR operators to extract the Arabic root and additional
indicators.
To understand how these rules work, refer to figure 5 as an example
I/P Word
Root : =
Output
Prefix =
Suffix =
Infix = -
AND operator
0
0
1
1
1
0
I/P Mask Template
Fig. (5): Example of Arabic Template Mask for (‫)يشكرون‬
4.2. Setting Rules
This rule is employed to find out another derivation of words after the first rule
(unsetting rules) or sole. Therefore use a setting mask with the same manner except
the OR operator is used instead of AND. The setting rules algorithm can be
summarized as the following steps:
11
1. To set a character in the input word, use 1 for corresponding
character in the mask.
2. To leave a character in the input word unchanged, use 0 for the
corresponding character in the mask.
To simplify those rules, refer to the characteristic of the OR operator as shown in
figure (6) and assume that the input Arabic word is (‫)شككر‬. The mask should have
stream of alternatives to find out all the possible derivations from the word (‫)شكر‬.
I/P Word
Derivation
s O/P :
e.g.
Output
…
...
OR operator
….
...
Suffixes Mask
0
0
0
Prefixes
Mask
…
Infixes Mask
Adding Masking Rules
Fig.(6): Example of Arabic Template Mask for (‫ )شكر‬to find out its Arabic derivations
5. Results and Discussion
There is a triliteral, quadrilateral, or pentaliteral Arabic verbs. Every Arabic verb has
its own derivatives and these derivatives are depend on its type. About 30 types of the
triliteral verbs contain 5321 verbs. This can produce 20000 templates. If affixes rules
are applied for these templates (4 prefixes and 30 suffixes), therefore the total number
of Arabic word derived from verbs are 28,140,005 derivations.
5.1 Testing
Arabic artificial words were used in testing the proposed model. Such words include
all their various possible derived verbs, nouns, adjectives, adverbs, etc and various
12
combinations of using affixes (prefixes, suffixes, infixes, and connected pronouns).
The testing sample included 50 roots and their derivations. The results of this
experiment are presented in table (1). The sample was composed of 60% (30 roots) of
which was derived from sound verbs and 40% (20 roots) belonged to weak verbs.
Table (1): Results of Testing the Proposed Model
Total number of hits
Correct Ratio
Error ratio
Sound Verbs
30
100 %
0.00 %
Weak Verbs
20
98 %
0.02 %
Total
50
99 %
0.01 %
The testing is used to find out:
(1) Roots of entire Arabic words (figures 7-a,b and c).
(2) Morphological analysis of the entire Arabic words with associated analyzed
properties, (figures (8-a, and b)
(3) Possible diactrize of the entire Arabic words (figure 9).
Fig. (7-a): Example to find Root of the Arabic word (‫)فسيكفيكهما‬
13
Fig. (7-b): Example to find Root of the Arabic word (‫)المهزوم‬
Fig. (7-c): Example to find Root of the Arabic word (‫)ق‬
14
Fig. (8-a): Example to find Properties of the Arabic word (‫)فسيكفيكهم‬
Fig. (8-b): Example to find Properties of the Arabic word (‫)ضارب‬
Fig. (9): Example to find Properties of the Arabic word (‫)ضارب‬
15
5.2 Complexity
Due to proposed model complexity, so turn our attention to how morphological
analysis is conducted by the proposed model, we find that the running time cost is
determined by three component of the following algorithm:
Step 1: Checking the existence of the entire Arabic word and order of root using the
proposed Arabic dictionary.
Step 2: Validating prefixes and suffixes of the entire Arabic word using the proposed
template Arabic grammar.
Step 3: Validating infixes of the entire Arabic word – if needed.
Therefore, for the first step “Checking the existence of the entire Arabic word and
order of root using the proposed Arabic dictionary”, the comparison is carried out
character by character, i.e.; we should assume that the number of comparisons would
be:
T1 = n
Where n is length of the entire Arabic word (n=3 for trilateral or 4 for quadrilateral, or
5 for pentaliteral).
At the second step, if the entire Arabic word exists in a proper sequence after
validating prefixes and suffixes, such that are checked against a list of stored prefixes
and suffixes, the number of comparisons determined as follows:
T2 = Log Nps
Where, Nps is the number of prefixes and suffixes.
The validation of word infixes depends on two factors [32]: the size of difference
between positions of the letters of root in the entire word, and the list of infix letters to
be checked. Accordingly, the number of comparisons would be calculated as follows:
T3 = D + M
Where, D is the number of comparisons for checking the difference, M is the number
of character comparisons to match an infix against the set infixes.
Consequently, the overall running time for our proposed model is computed as the
sum of the three factors listed above.
T = T 1 + T2 + T 3
= n + (Log Nps) + (D + M)
16
References
[1] Doug Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys, and Louisa Sadler,
MACHINE TRANSLATION: An Introductory Guide, 1995.
[2] W.J. Hutchins and H.L. Somers. An Introduction to Machine Translation.
Academic Press, London, 1992.
[3] http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/
[4] Hassanin M. Al-Barhamtoshy, Understanding of Arabic Text, Ph. D. dissertation,
Al-Azhar University, 1992.
[5] Ronnie Cann. Formal Semantics. Cambridge University Press, Cambridge, 1993.
[6] http://www.worldlingo.com/products_services/worldlingo_translator.html
[7] Ashraf I Madkour and Hassanin M. Al-Barhamtoshy, Arabic Morphological
Analyzer, Al-Azhar Engineering International Conference, AEIC 1993, CairoEgypt.
[8] Mohammed Aljlayl, Ophir Frieder, Corpus Linguistics: Effective arabic-english
cross-language information retrieval via machine-readable dictionaries and
machine translation, Proceedings of the tenth international conference on
Information and knowledge management, October 2001 .
[9] Nasreen AbdulJaleel, Leah S. Larkey, Information retrieval session 3: cross
language retrieval: Statistical transliteration for english-arabic cross language
information retrieval, Proceedings of the twelfth international conference on
Information and knowledge management, November 2003.
[10] Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell, Arabic Information
Retrieval: Improving stemming for Arabic information retrieval: light stemming
and co-occurrence analysis, Proceedings of the 25th annual international ACM
SIGIR conference on Research and development in information retrieval, August
2002.
[11] Mohammed Aljlayl, Ophir Frieder, Information retrieval 1: On arabic search:
improving the retrieval effectiveness via a light stemming approach, Proceedings
of the eleventh international conference on Information and knowledge
management, November 2002.
[12] M. A. madkour, A. Al-samahy and Hassanin M. Al-Barhamtoshy, “An Arabic
Morphological Analyzer”, Al Azhar Engineering International Confrence, AEIC
1991, Cairo, December 1991.
[13] N. H. Hegazi, and A. A. Elsharkawi. "An Approach to a Computerized Lexical
Analyzer for Natural Arabic Text". Proceedings of the Arabic Language
conference, Kuwait,1985.
[14] M. Geith, T. El-Sadany. "Arabic Morphological Analyzer on a Personal
Computer". Proceedings of the First KSU Symposium on Computer
Arabization.1987.
[15] S. S. Al-Fadaghi and F. S. Al-Anzi.” A new algorithm to generate Root-pattern
Forms”. Proceedings of the 11th National Computer Conference, KFUPM, P.391.
1989.
[16] Y. Hilal “Morphological Analysis of Arabic Morphology", Computer Processing
of the Arabic Language,Workshop Papers, vol. I, April, Kuwait.1985
[17] Botrous Thalouth and Abdullah Al-Dannan. “ A Comprehensive Arabic
Morphological Analyzer /Generator”. IBM Kuwait Scientific Center. Feb. 1987.
[18] Imad A. Al-Sughaiyer and Ibrahim A Al-Kharashi “Arabic Morphological
Analysis Techniques: A Comprehensive Survey”, CERI internal report, KACST
2003.
17
‫ مدينة الملك عبد العزيز‬،‫] مشروع بناء القاعدة الصرفية لمفردات اللغة العربية باستخدام الذخيرة اللغوية‬20[
.‫هـ‬1424/4/3 ،‫ معهد بحوث الحاسب واإللكترونيات‬،‫للعلوم والتقنية‬
[20]Violetta Cavalli-Sforza, Abdelhadi Soudi, Teruko Mitamura , Arabic morphology
generation using a concatenative strategy, Proceedings of the first conference on
North American chapter of the Association for Computational Linguistics, April
2000.
[21] Leah S. Larkey, Fangfang Feng, Margaret Connell, Victor Lavrenko, Machine
learning for IR: Language-specific models in multilingual topic tracking,
Proceedings of the 27th annual international conference on Research and
development in information retrieval, July 2004.
[22] Abduelbaset Goweder, Massimo Poesio, Anne De Roeck, Posters: Broken plural
detection for arabic information retrieval, Proceedings of the 27th annual
international conference on Research and development in information retrieval,
July 2004.
[23] Ying Zhang, Phil Vines, Cross-language information retrieval: Using the web for
automated translation extraction in cross-language information retrieval,
Proceedings of the 27th annual international conference on Research and
development in information retrieval, July 2004.
[24] Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Tom De Groeve, Information
access and retrieval (IAR): Geographical information recognition and
visualization in texts written in various languages, Proceedings of the 2004 ACM
symposium on Applied computing, March 2004.
[25] Markus Forsberg, Aarne Ranta, Functional morphology, ACM SIGPLAN
Notices , Proceedings of the ninth ACM SIGPLAN international conference on
Functional programming, Volume 39 Issue 9, September 2004 .
[26] Yan Qu, Gregory Grefenstette, David A. Evans, Cross-lingual information
retrieval: Automatic transliteration for Japanese-to-English text retrieval,
Proceedings of the 26th annual international ACM SIGIR conference on Research
and development in informaion retrieval, July 2003.
[27] Leah S. Larkey, Margaret E. Connell, Nasreen Abduljaleel, Hindi CLIR in thirty
days, ACM Transactions on Asian Language Information Processing (TALIP),
Volume 2 Issue 2, June 2003.
[28] Ari Pirkola, Jarmo Toivonen, Heikki Keskustalo, Kari Visala, Kalervo Järvelin,
Cross-lingual information retrieval: Fuzzy translation of cross-lingual spelling
variants, Proceedings of the 26th annual international ACM SIGIR conference on
Research and development in information retrieval, July 2003.
[29] Paul McNamee, James Mayfield, Cross-language Information Retrieval:
Comparing cross-language query expansion techniques by degrading translation
resources, Proceedings of the 25th annual international ACM SIGIR conference
on Research and development in information retrieval, August 2002.
[30] Satoshi Sekine, Ralph Grishman, Hindi-english cross-lingual question-answering
system, ACM Transactions on Asian Language Information Processing (TALIP),
Volume 2 Issue 3, September 2003.
[31] William J. Black and Sabri El-Kateb, A Prototype English-Arabic Dictionary
based on Word Net, UMIST, Department of Computation, Manchester, M60
1QD, UK, Piek Vossen (Eds): GWC 2004, Proceedings, pp. 67-74.
[32] Suleiman H. Mustafa (2003), A Morphology-driven string matching approach to
Arabic text searching, the Journal of Systems and Software 67 (2003) 77-87.
18
Hassanin M. Al-Barhamtoshy is a professor of computer science in the Department of Information
Technology at King Abdulaziz University (Jeddah, Saudi Arabia).
He earned his Ph.D. in computers and systems engineering from the University of Al-Azhar (Egypt) in
1992. He was granted several academic awards and scholarships. After graduation, he worked at AlAzhar University and chaired many external projects for this four years. In 1996 he went on leave
from Al-Azhar for six years during which he worked in the Department of Computer Science at King
AbdulAziz University, Faculty of Science. He is at present a full professor at KAU, Faculty of
Computing and Information Technology. He has published several papers in a number of research
areas in computer science and computer engineering including natural language processing (especially
Arabization of computers), database and information retrieval systems, software engineering and
artificial intelligence.
19
Download