Arabicl-Template-PSU Research Proposal

advertisement
PSU Research Proposal
Title: A Toolbox for Arabic
Text Mining
Department: Computer
Science
PI Name: Ahmed Sameh
Duration: 1 Year
Budget Est.: SR 55,000
Date: 12/20/2010
0
I - PROPOSAL
I-1: PROPOSAL TITLE (Provide a short descriptive title, give prominence to keywords)
A Toolbox for Arabic Text Mining
I-2: COMMERCIAL POTENTIAL
Yes
Could this project have commercial potential? (Select one)
No
 If yes, briefly elaborate on the commercial potential
I-3: CHECK-LIST
Have you checked to ensure all questions in the application form have been answered?
Have you checked to ensure you have included the correct costs in your budget?
The principal investigator and all co-principal investigators should sign.
I-4: PERSONNEL AND AUTHORIZATION
PRINCIPAL INVESTIGATOR [PI]
Academic Rank:
College:
Full Name: Ahmed Sameh
Department: Computer Science
CIS
Telephone:
Mobile:
Professor
494-8524
Ext:
0544299846
X8524
E-Mail: asameh@cis.psu.edu.sa
Signature:
Date: 12/20/2010
CO- INVESTIGATOR(S) [CIs]
1)
Full Name:
Academic Rank:
College:
(non-PSU CIs permitted)
Mona Diab
Assistant Professor
E-Mail:
Department: Linguistics Department &
Natural Language Processing Group
Stanford University
Telephone:
Mobile:
Signature:
2)
Full Name:
Date:
NourelDean Soufian
1
/
/
Academic Rank: Assistant Professor
College:
CIS
E-Mail:
Department: Computer Science
Telephone:
Mobile:
Signature:
3)
Date:
College: CIS
Telephone:
/
/
Date:
/
/
Date:
/
/
Date:
/
/
Department: Computer Science
Mobile:
Full Name
Academic Rank:
College:
Telephone:
E-Mail:
Department:
Mobile:
Signature:
Full Name:
Academic Rank:
College:
Telephone:
E-Mail:
Department:
Mobile:
Signature:
6)
Date:
E-Mail:
Signature:
5)
/
Full Name: Mohamed Tounsi
Academic Rank: Associate Professor
4)
/
Full Name:
Academic Rank:
College:
Telephone:
E-Mail:
Department:
Mobile:
Signature:
II - DESCRIPTION
II-1: ABSTRACT (Provide a statement of the project - maximum 200 words)
Text Mining refers to the process of deriving high-quality information from text. High-quality information is
typically derived through the divining of patterns and trends through means such as statistical pattern
2
learning. Text mining usually involves the process of structuring the input text (usually parsing, along with
the addition of some derived linguistic features and the removal of others, and subsequent insertion into a
database), deriving patterns within the structured data, and finally evaluation and interpretation of the output.
'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness.
Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of
granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e.,
learning relations between named entities).
Natural language processing (NLP) within the Arabic language has been struggling over the years. Very little
has been done in term of producing powerful tools for Arabic processing. In fact, Arabic is feared to be
recognized as a language of the past, as very many new terms and names in the modern world has no terms
and names in the Arabic language. This problem has developed over the years due to the fact that the Arabic
languagistic researchers are fare away from modern technological tools, and they are not willing to
collaborate with information technology researchers. This lake of communication and collaboration has lead
to the current state of affairs with the NLP of Arabic text.
Arabic text mining is way behind compared to English text mining. Several English text mining algorithms in
the areas of text categorization, text clustering, concept/entity extraction, production of granular taxonomies,
sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between
named entities have powerful algorithms and tools. This research proposal will try to rectify this situation by
developing an Arabic toolbox that will cover basic comparable English algorithms.
In this project we will develop an Arabic toolbox that will contain algorithms for categorization and
classification, clustering and grouping of related documents, concept extraction algorithms, production of
taxonomies, Wordnet for verbs, nouns, and adjectives, simple dictionary, sentiment analysis algorithms, and
document summarization. The toolbox will be web based with background database of documents and related
resources. An Arabic stemmer will be developed along with tagging algorithm. Sample of small
implementations of some of these algorithms are demonstrated in this proposal. These are initial results that
demonstrate the capabilities of the current team.
II-2: PROJECT GOALS AND OBJECTIVES
The specific goals of this project are to demonstrate the power of Text mining within the Arabic language in:
-Concept Mining: Concept mining is an activity that results in the extraction of concepts from set of
documents. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data
mining and text mining. Because artifacts are typically a loosely structured sequence of words and other
symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the
meaning, provenance and similarity of documents. Traditionally, the conversion of words to concepts has
been performed using a thesaurus, and for computational techniques the tendency is to do the same. The
thesauri used are either specially created for the task, or a pre-existing language model, usually related to
Princeton's WordNet.
The mappings of words to concepts are often ambiguous. Typically each word in a given language will relate
to several possible concepts. Humans use context to disambiguate the various meanings of a given piece of
text, where available. Machine translation systems cannot easily infer context. For the purposes of concept
mining however, these ambiguities tend to be less important than they are with machine translation, for in
large documents the ambiguities tend to even out, much as is the case with text mining.
There are many techniques for disambiguation that may be used. Examples are linguistic analysis of the text
and the use of word and concept association frequency information that may be inferred from large text
corpora. Recently, techniques that base on semantic similarity between the possible concepts and the context
have appeared and gained interest in the scientific community.
3
-Arabic Wordnet: WordNet is a lexical database for the Arabic language. It groups Arabic words into sets of
synonyms called synsets, provides short, general definitions, and records the various semantic relations
between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus
that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications.
The database and software tools have been released under a BSD style license and can be downloaded and
used freely. The database can also be browsed online. WordNet was created and is being maintained at the
Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A.
Miller. Development began in 1985. Over the years, the project received funding from government agencies
interested in machine translation. As of 2011, the WordNet does not have an Arabic version. Arabic may be
one of the few languages that does not have WordNet version. This project will build one only for the verbs.
-Arabic Dictionary : It’s an online interactive Arabic dictionary and thesaurus that helps you find the
meanings of words and draw connections to associated words. You can easily see the meaning of each by
simply placing the mouse cursor over it. Based on Arabic WordNet we will develop an Arabic dictionary.
Our goals are for the dictionary to be: Easy to use dictionary and thesaurus, Learn how words associate in a
visually interactive display, Get ideas to help write content for your blog, article, thesis or simply play with
words! , No limit on number of searches. Look up as many words as you need anytime. The user just type
words in the search box and click Go or simply hit Enter. Once the words branch off the main query, you can
double click a node to find other related words. To explore the features: Place the mouse cursor over a word
to view the meaning, Double click a node from the branch to view other related words, Scroll the mouse
wheel over words to zoom in or out. This helps you see more associations or view words and meanings more
clearly, finally, Click and drag a word or branch to move it around and explore other branches. The Words
interface queries the Arabic WordNet lexical database developed by Princeton University and made available
for students and language researchers. This dictionary groups synonyms into synsets through lexical relations
between terms. These meanings and semantic relationships are revealed graphically by the interactive web
technology made available by Snappy Words.
-Arabic Documents Classification: Document classification/categorization is a problem in information
science. The task is to assign an electronic document to one or more categories, based on its contents.
Document classification tasks can be divided into two sorts: supervised document classification where
some external mechanism (such as human feedback) provides information on the correct classification for
documents, and unsupervised document classification, where the classification must be done entirely
without reference to external information. There is also a semi-supervised document classification, where
parts of the documents are labeled by the external mechanism.
-Arabic Document summarization: Automatic summarization is the creation of a shortened version of a text
by a computer program. The product of this procedure still contains the most important points of the original
text. The phenomenon of information overload has meant that access to coherent and correctly-developed
summaries is vital. As access to data has increased so has interest in automatic summarization. An example
of the use of summarization technology is search engines such as Google. Technologies that can make a
coherent summary, of any kind of text, need to take into account several variables such as length, writingstyle and syntax to make a useful summary.
III - INTRODUCTION
III-1: REVIEW AND ANALYSIS OF RELATED WORK
Labor-intensive manual text mining approaches first surfaced in the mid-1980s, but technological advances
4
have enabled the field to advance during the past decade. Text mining is an interdisciplinary field that draws
on information retrieval, data mining, machine learning, statistics, and computational linguistics. As most
information (common estimates say over 80%) is currently stored as text, text mining is believed to have a
high commercial potential value. Increasing interest is being paid to multilingual data mining: the ability to
gain information across languages and cluster similar items from different linguistic sources according to
their meaning.
Currently there is very little previous work done in Arabic Text mining. English text mining on the contrary
has many algorithms and techniques. One of the directions that we will explore in this research is borrow
some ideas from these algorithms and try to develop similar Arabic versions.
III-2: SIGNIFICANCE OF WORK
The Arabic language needs more work from all of us to stand up as a living language and to coop up with
the current advancement in technology. Such Arabic tool box is so much needed at this era of world
globalization.
IV - APPROACH AND METHODOLOGY
IV-1: METHODOLOGY
Until recently, websites most often used text-based searches, which only found documents containing
specific user-defined words or phrases. Now, through use of a semantic web, text mining can find content
based on meaning and context (rather than just by a specific word).
Additionally, text mining software can be used to build large dossiers of information about specific people
and events. For example, large datasets based on data extracted from news reports can be built to facilitate
social networks analysis or counter-intelligence. In effect, the text mining software may act in a capacity
similar to an intelligence analyst or research librarian, albeit with a more limited scope of analysis.
Text mining is also used in some email spam filters as a way of determining the characteristics of messages
that are likely to be advertisements or other unwanted material. Recently, text mining has received attention
in many areas.
Many text mining software packages are marketed towards security applications, particularly analysis of
plain text sources such as Internet news.It also involves in the study of text encryption.
One of the directions in this research is to adapt and modify selected English Text Mining tools (from the
above web site) in order to produce their equivalent Arabic versions. The cross validation method requires
very accurate English/Arabic translator that will provide input data to the Algorithm/program conversion.
Areas of investigations in this project include: Arabic Natural Language Processing, Text Mining of Quran:
The second objective is to strive to improve the quantity and quality of Arabic contents in the area of “Data
and Text Mining” on the Web. All published material from the Hub’s activities will be translated and
reviewed by its author(s) to be available in an Arabic Digital Library. A systematic plan to translate many
“data mining” articles and storing them in a searchable Arabic Digital Library will be developed. Text and
5
Multi-media mining tools will be used to explore this Arabic digital library contents and expose related and
correlated paragraphs and sections for the purpose of developing new Arabic Text mining algorithms and
enhance exiting ones. This brings the other area of focus of the Hub which is the unstructured Text mining.
As for the Unstructured Text mining: Parallel to the Arabic digital library there will be also an English Data
Mining digital library (having the same contents) that will be developed. Both libraries will have traditional
search engine beside more elaborated classification and categorization capabilities. Further to this, Text and
Multi-media mining tools will be used to explore the two digital libraries contents and expose related and
correlated paragraphs and sections. Text mining is used to find interesting regularities in large textual digital
libraries. Where interesting means: non-trivial, hidden, previously unknown and potentially useful. Both
Arabic and English Text mining tools handle digital libraries text at the word level, sentence level, document
level, document-collection level, linked-document collection level, and at the application level. Most of the
text mining methods reply on the fact that there is usually high redundant data in the documents. Most of the
tools make use of: document summarization techniques, single document graph visualization algorithms,
segmentation algorithms, features selection algorithms, similarity algorithms, clustering, and information
extraction techniques.
They also make use of several visualization techniques such as: WebSOM, ThemeScape, Graph-Based
visualization techniques, and Tiling-based visualization techniques.
Statistical tools for text mining include: Yale/Rapid Miner word vector mining, UIMA by IBM, GATE, Aero
Text suite, Attensity, Endeca Technologies, Inxight, and Language Ware.
Similar to what we provide for “Data Mining” we also propose the same vertical stacking of text Mining,
statistical, and visualization algorithms for performing text mining to both the English and the Arabic data
mining digital libraries. This will provide an interesting context for researchers in “Text mining” and
“Arabization” fields to investigate how to improve the Arabic text mining algorithms and use a cross
reference to the English ones. A very interesting research direction can be developed there. For example, the
same mining questions can be posed to both the English and the Arabic digital libraries and the results can be
compared. In cases of differences, learning opportunities will be developed and algorithms’ modifications
and enhancements are to be investigated. The two libraries will provide several ways and means for
verification, validation, and cross checking
Text
Testing
Client
Egnine
Deliverables in phase I: Beta Version I + its Benchmark + its Tuning
Deliverables in Phase II: Beta Version II + its Benchmark + its Tuning
Deliverables in Phase III: Beta Version III + its Benchmark + its Tuning
Deliverables in Phase IV: Final Version + User Manual
The following is the project plan schedule. It represents those different tasks within the research and
estimated duration for each.
6
IV-2: AVAILABLE RESOURCES
Currently there are some open source text mining algorithms that can be used as tools in some of the above
investigations.
IV-3: EXPECTED RESULTS/OUTPUTS
The expected output from this project is a Web based Arabic toolbox that will contain basic Arabic
algorithms for Arabic natural language text mining. Some of the algorithms that will be provided under this
tool are: text categorization, text clustering, concept/entity extraction, production of granular taxonomies,
sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between
named entities).
The following are some initial results that we have already implemented in the domains of: Arabic Text
categorization/Classification, construction an Arabic Wordnet for Arabic Verbs, and developing an Arabic
Stemmer. In the following paragraphs we provide short descriptions and some screen shots for the developed
tools.
This tool presents an Arabic Text Mining tool used for classification according to some statistics. As the
number of Arabic documents that are displayed every day on the web or on other media has grown rapidly,
he need to analyze and classify these documents has become important nowadays.This tool will take as an
input any document that is presented on the web or news papers. The tool will then classify the input
document to one of a number of categories provided by the tool. These categories are:
o Economic paper
o Political paper
o Medical paper
o Religion paper
The general idea behind the tool is that it takes a document as an input provided by the user. The tool will
then store all the words used in the document without repeating or excluding any word. After that, the tool
will compute the frequency for each stored word which is then used as a statistic to classify the document.
The tool will use a number of databases as training sets to classify the document. The tool does the following
7
processing:
o
o
o
o
o
Take the word with the highest frequency.
Search for that word in each of the databases given
If the word is found in any one of the databases, the tool will stop and classify the
document as the same type of the database where the word has been found in.
If the word is not found in any database, the tool will take the next highest frequency word
and do the same thing done for the previous word.
If none of the stored words is found in any of the databases, the program state that the
document can't be classified.
This tool as a matter of fact takes all the words in the input document without excluding any of the common
used words in Arabic language. However, we don't need to check for these words and then remove them
because these words will not be provided in any of the databases for the tool to search in. So, not including
these words in the databases will not force the tool to remove them whenever they are found because the tool
will skip them after not finding them in the databases. All the used databases are in a text file format and they
can easily be updated by the user to increase the size of the training set. In addition to the classification,
another text file including all the stored words and their frequency will be added.
Further work:
More databases can be added to the tool in order to have bigger training sets which will result in
better results.
The output text file including all the words along with their frequency can then be integrated with
other text mining tools for the statistics it provides.
The tool can be enhanced by taking the highest two or three words instead of the highest frequency
one to be used to classify the document. This as a matter of fact will result in a better classification.
The tool can also be modified by the following. Instead of classifying it as the same type of the
database by the highest frequency word, the tool can be modified to provide a percentage for each
database for the occurrence of the stored words that are found in each one of them as another
statistical approximation. The user is then required to analyze the resulted percentages to better
assign the document to one of the categories.
The following are screen shots from the tool:
8
The second tool implemented is a sample Arabic Wordnet dictionary. It deals only with verbs.
This tool presents an Arabic Text Mining tool. The tool provides a Wordnet for Arabic words only. These as
9
a matter of fact can be used to understand the meaning of the words provided which can be used for many
purposes like classification, clustering, and summarization of a text.This tool will take as an input any
document that contains Arabic words only. All the words in the input file are nouns and all of them is on the
form of " ‫ " فعل‬. the output will then be another file containing the word and all of its synonyms.
Method Used:
The general idea behind the tool is to take a file containing only the words to look for their synonyms. The
tool will then take all these words one by one. When the tool takes a word, it will go and search for that word
in another file containing groups of words where each group contains words with the same meaning. When
the tool find the target word in one of the groups, it will return that group and store the target word followed
with all of its synonyms in an output file. If the target word is not found in group, the tool will put it also in
the output file while notifying that it didn't find any related word to it.
Further work:
We can expand the training set to have bigger training sets so that we can find a meaning or a
synonym for any input word.
We can also make other training sets for verbs and other non-noun or non-verbs Arabic words to
enlarge the training sets.
The output file is formatted in a way that makes it easy to integrate it with other text mining tools
and using it for other purposes like classification, clustering, and text summarization.
The method used here to search for the word is using sequential search because the training set here
is small. However, it would be better to enhance the tool by using another searching algorithm
which is faster. The need for this will rise if we enlarge the training set or add another data files for
verbs and other Arabic words types.
The following are screen shots from the tool:
10
The third tool is an Arabic Stemmer. The following is a description with screen shots.
The word Stemming in Data Mining and other fields refers to the process for reducing inflected (or
sometimes derived) words to their stem, base or root form – generally a written word form. The stem need
not be identical to the morphological root of the word; it is usually sufficient that related words map to the
same stem, even if this stem is not in itself a valid root. The first ever published stemmer was written by Julie
Beth Lovins in 1968. Stemmers are commonly used for many purposes like: Information Retrieval and in
commercial products. Stemmers are common elements in query systems such as Web search engines
Description of The Tool:
In this report, I will talk about an Arabic word stemmer that is adopted from Arabic Stemmer by Shereen
Khoja by Motaz K. Saad. The tool will take as an input a file containing Arabic texts and words. The tool
will then perform some operation to store all the words in the input file. While reading the file, the tool will
remove any words that usually can't be stemmed because they are not important like numbers(written in
letters), special characters, or symbols. For each word of intrest, the tool will does the following checks:
o Check if the word consists of two letters.
o Check if the word consists of three letters.
o Check if the word consists of four letters.
o Check if the word is a pattern.
o Check for a definite article.
o Check for the prefix.
o Check for suffixes.
The tool will use a large database consisting of stems of most of the Arabic used words that can be
found in news articles, magazines, websites, etc…
The tool is implemented using Java language and is integrated with the weka tool in order to stem.
11
-
The output stems will then be stored in a way that can be easily integrated with other tools like
search engines, clustering tools, classification, etc…
‫أن‬
‫ليت‬
‫لعل‬
‫السيما‬
‫واليزال الحالي‬
‫ضمن‬
‫اول‬
‫وله‬
‫ذات‬
‫اي‬
‫بدال‬
‫اليها‬
‫انه‬
‫الذين‬
‫فانه‬
‫إن‬
‫بعد‬
‫ضد‬
‫يلي‬
‫الى‬
‫إلى‬
‫في‬
‫وفي‬
‫من‬
V - REFERENCES
1- Arabic Text Mining Tutorial : http://textminingthequran.com/tutorial/bismillah.html
VI - ROLE(S) OF THE INVESTIGATOR(S)
(Attach a brief CV for each investigator following the format in Appendix A)
#
Name of Investigator
Area of contribution to the project
1
Prof. Ahmed Sameh
System Design & Implementation
2
Asst Prof. Mona Diab
Data Collection & Preparation
3
Dr. Mohamed Tunsi
4
Dr. Noureldean Soufian
Data Mining Tools
System Design & Implementation
5
6
12
VII - PROJECT SCHEDULE
PHASES OF PROJECT IMPLEMENTATION (SEE GANETT CHART ABOVE)
Steps
1
Duration
Task
(Months)
System requirements specifications: Sameh, Tunsi
System Architecture : Soufian
System Design: Sameh
Databases Designs: Sofian
Prototyping of critical sub-systems: Tunsi, Sameh
System Detailed Design: Sameh, Tunsi
Beta Version Implementation: Sameh, Soufian
Testing: Soufian
Building Deployment Environment: Sameh
Bench Marking and Collecting Results (First Round): Tunsi
System Tuning (Based on First Round Results): Sameh
Bench Marking and Collecting Results (Second Round Results): Soufian
System Tuning (Based on Second Round Results): Sameh
Bench Marking and Collecting Results (Third Round Results): Tunsi
Version 1 Release:Tunsi
Results Documentation and Analysis with the Performance requirements:Sameh
Detailed Code Documentation: Sameh
User and Installation Guide (Full How To): Soufian
See Gantt
Chart
within this
proposal
Total duration for the proposed project
12 Month
VIII - BUDGET OF THE PROPOSED RESEARCH (Budget in SAR)
Amount Priority 1 = Max;
Amount
Requested
2 = Mod; Approved
3 = Low.
(SAR)
(SAR)
Item
A. Personnel* (Research Assistant)
24,000
1
For
Official
Use
1- Student Ahmed Al-Jabreen
2- Student Kamal Qarawi
3- Student Omar Al-Moughnee
4- Student Amro Al-Munajjed
13
B. Equipment* (List)
5,000
1
5,000
2
1000
2
10,000
1
10,000
1
Development Server
C. Testing and Analysis* (Location/Laboratory)
Labtop Computer
D. Consumables* (List)
Desk Tools
E. Travel *(Local/Internat)
1- Travel for Mona Diab (Stanford/ Riyadh)
F. Software* (List)
-SAS Data Mining Tools
-Oracle 9i Data Mining
-Clementines from SPSS
-Ants Model Builder
G. Other Items* (Itemize)
---
Total Amount Requested (SAR)
55,000
IX- JUSTIFICATION OF BUDGET (Justify each item listed in the budget in the previous section)
Item
A
Students Research
Assistants
Justification
Salary of SR 500 for each student for 12 months the duration of
the project.
14
B
For developing the proposed experiments.
Development Server
C
For on-site data collection and on-site testing
Laptop Computer
D
For general use by team members
Desk tools
E
For the two outside PSU team members.
Travel
F
Data Mining Tools Software
Software
G
X - RELEASE TIME FOR RESEARCH TEAM MEMBERS
RELEASE TIME FROM TEACHING LOAD
#
PI
Time Commitment
Team Member
(hrs/weeks/terms)
Ahmed Sameh
4 h/w
15
Teaching
Load Max
e.g. 1 course
FA11
CI1
CI2
CI3
Noureldean Soufian
2h/w
Mohamed Tounsi
2h/w
Mona Diab
1h/w
CI4
1h/w
CI5
XI - EXTERNAL FUNDING
#
1
Source of Funds
Amount (SAR)
Used for
…… costs
None
2
3
Appendix A: CV Format for Principal Investigator and Co-Investigators
(Two pages maximum, material should be related to submitted project)
Title and Name: Professor Ahmed Sameh
Specialty: Artificial Intelligence, Modeling and Information Systems
Department and College: Computer Science
Summary of Experience/Achievements Related to Research Proposal:
1- Ahmed Sameh, Ayman Kassem, “Lumbar Spine: Parameter Estimation for Realistic Modelling”, WSEAS
Transactions on Applied and Theoretical Mechanics, ISSN:1991-8747, Issue 5, Volume 2, May 2008
2- Ahmed Sameh, Ayman Kassem, “A General Framework for Lumbar Spine Modelling and Simulation”,
International Journal of Human Factors in Modelling and Simulation, IJHFMS, The North American Spine
Society, Volume 1, Issue 2, January 2008
3- Dalia El-Mansy, Ahmed Sameh, “A Collaborative Inter-Data Grid Strong Semantic Model with Hybrid
Namespaces”, Journal of Software (JSW), Academic Publisher, Volume 3, Issue 1, January 2008
4- Ahmed Sameh, “Simulating Lumbar Spine Motion”, Research in Computing Science (RCS) Journal,
16
National Polytechnic Institute of Mexico, ISSN 1665-9899, Volume 18, Issue 4, June 2007
5- Ahmed Sameh, and Ayman Kassem, “3D Modeling and Simulation of Lumbar Spine Dynamics”, in the
International Journal of Human Factors Modelling and Simulation , Volume IJHFMS-942, 2007
6-Adhami Louai, Abdel-Malek Karim, McGowan Dennis, Mohamed A. Sameh, "A Partial Surface/Volume
Match for High Accuracy Object Localization", International Journal of Machine Graphics and Vision, vol
10, no. 2, 2001
7-Mohamed A. Sameh, “Interactive Learning in Artificial Neural Networks Through Visualization”, The
International Journal of Computers and Applications (IJCA), Vol. 20, #2, 1998
8- Mohamed A. Sameh and Attia E. Emad, "Parallel 1D and 2D Vector Quantizers Using Kohonen SelfOrganizing Neural Network", in the International Journal of the Neural Computing and Applications, V.
(4), no. 2, Springer Verlag, London, 1996
9- Ahmed Sameh, Amgad Madkour, “Intelligent open Spaces: Learning User History Using Neural Network
for Future Prediction of Requested Resources”, Proceedings IEEE CSE'08, 11th IEEE International
Conference on Computational Science and Engineering, 16-18 July 2008, São Paulo, SP, Brazil. IEEE
Computer Society 2008, ISBN 978-0-7695-3193-9
10- Ahmed Sameh, Ayman Kaseem, “Modelling and Simulation of Human Lumbar Spine”, Proceedings of
the 2008 International Conference on Modelling, Simulation, and Visualization, MSV 2008, Las Vegas,
Nevada, July 14-17, 2008, CSREA Press 2008, ISBN 1-60132-081-7
11- Ahmed Sameh, Dalia El-Mansy, “A Collaborative Inter-Data Grids Model with Hybrid Namespace”, 14th
IEEE International Conference on Availability, Reliability, and Security, (DAWAM – ARES 2007), Vienna,
Austria, April 10-13, 2007
12- Ahmed Sameh, “Simulating Lumbar Spine Motion: Parameter Estimation for Realistic Modelling”, The
6th Mexican International Conference on Artificial Intelligence (MICAI07), Aguascalientes, Mexico,
November 4-10, 2007
13- Sherif Akoush, Ahmed Sameh, “Bayesian Learning of Neural Networks for Mobile User Position
Prediction”, The International Workshop on Performance Modelling and Evaluation in Computers and
telecommunication Networks (PMECT07)- part of the IEEE 16th International Conference on Computer
Communications and Networks, ICCCN 2007, Honolulu, Hawaii, August 13-16, 2007
14- Ahmed Sameh, “The Schlumberger High Performance Cluster at AUC”, Proceedings of the 13th
International Conference on Artificial Intelligence Applications, Cairo, February 4-6, 2005
15-Mohamed A. Sameh, Rehab El-Kharboutly, "Modeling a Service Discovery Bridge Using Rapide
Architecture Description Language", Proceedings of the 18th European Simulation Multiconference (ESM
2004), Magdeburg, Germany, June 13-16, 2004
16-Mohamed A. Sameh, Rehab El-Kharboutly, and Hazem Al-Ashmawy, "Modeling Wireless Discovery
and Deployment of Hybrid Multimedia N/W-Web Services Using Rapide ADL", Proceedings of the 7th
IEEE International Conference on High Speed N/Ws amd Multimedia Communications (HSNMC04),
Toulouse, France, June 30- July 2nd, 2004
17-Mohamed A. Sameh, Rhab El-Kharboutly, "Modeling Jini-UpnP Using Rapide ADL", Proceedings of the
10th EUROMEDIA Conference (EUROMEDIA 2004), Hasselt, Belgium, April 19-21, 2004
18-Mohamed A. Sameh, "E-Access Custom Webber: A Multi-Protocol Stream Controller", Proceedings of
the IADIS International Conference on Applied Computing, Lisbon, Portugal, March 23-26, 2004
19- Ayman Kassem, A. Sameh, and Tony Keller, “Modeling and Simulation of Lumbar Spine Dynamics”,
Proceedings of the 15th IASTED International Conference on Modeling and Simulation and Optimization
(MSO 2004), Marina Del Rey, California, March 2004
17
20-Mohamed A. Sameh, and Shenouda S., "Tera-Scale High Performance Distributed and Parallel SuperComputing at AUC", Proceedings of the 12th International Conference on Artificial Intelligence, Cairo, Feb.
18-20, 2004
21-Shenouda S., Mohamed L., and Mohamed A. Sameh, "AUC Cluster Participation in Global Grid
Communities", Proceedings of the 12th International Conference on Artificial Intelligence, Cairo, Feb. 1820, 2004
22-El-Ashmawi Hazem, and Mohamed A. Sameh, “XML-Socket Language-Independent Distributed Object
Computing Model”, Proceedings of the 15th International Conference on Parallel and Distributed
Computing Systems, Louisville, Kentucky, September, 2002
23-Mohamed Karasha, Greenshields Ian, and Mohamed A. Sameh, “HUSKY: A Multi-Agent Architecture
for Adaptive Scheduling of Grid Aware Applications”, Proceedings of the High Performance Computing
Symposium with the 2002 Advanced Simulation Technologies Conference (ASTC 2002), San Diego,
California, April 14-18, 2002
24-Atef Rania, Mohamed A. Sameh,and Abdel-Malek Karim, "Three Dimensional Deformable Modeling of
the Spinal Lumbar Region", Proceedings of the 11th International Conference on Intelligent Systems on
Emerging Technologies (ICIS-2002), Boston, July 18-20, 2002
25-Kassem Ayman, Mohamed A. Sameh, and Abdel-Malek Karim, "A Spring-Dashpot-String Element for
Modeling Spinal Column Dynamics", Proceedings of the International Workshop on Growth and Motion in
3D Medical Images, Copenhagen, Denmark, May 28- June 1, 2002
26-Kassem Ayman, and Mohamed A. Sameh, “A Fast Technique for modeling and Control of Dynamic
System”, Proceedings of the 11th International Conference on Intelligent Systems on Emerging Technologies
(ICIS-2002), Boston, July 18-20, 2002
27-Mohamed A. Sameh, and Kaptan Noha, "Anytime Algorithms for Maximal Constraint Satisfaction",
Proceedings of the ISCA 14th International Conference on Computer Applications in Industry and
Engineering (CAINE' 2001), Nov. 27- 29, at Las Vegas, Nevada, 2001
28-Mohamed A. Sameh, and Mansour Marwa "Enhancing Partitionable Group Membership Service in
Asynchronous Distributed Systems", Proceedings the ISCA 14th International Conference on Computer
Applications in Industry and Engineering (CAINE' 2001), Nov. 27- 29, at Las Vegas, Nevada, 2001
29-Abdalla Mahmoud, Mohamed A. Sameh, Harras Khalid, Darwich Tarek, "Optimizing TCP in a Cluster of
Low-End Linux Machines", Proceedings of the 3rd WSEAS Symposium on Mathematical Methods and
Computational Techniques in Electrical Engineering, Athens, Greece, Dec. 29-31, 2001
30-Rania Abdel Hamid, and Mohamed A. Sameh, “Visual Constraint Programming Environment for
Configuration Problems”, Proceedings of the 15th International Conference on Computers and their
Applications, New Orleans, Louisiana, March 2000
31-Essam A. Lotfy, and Mohamed A. Sameh, “Applying Neural Networks in Case-Based Reasoning
Adaptation for Cost Assessment of Steel Buildings”, Proceedings of the 10th International Conference on
Computing and Information, ICCI-2000, Kuwait, Nov. 18-21, 2000
32-Ghada A. Nasr, and Mohamed A. Sameh, “ Evolution of Recurrent Cascade Correlation Networks with a
Distributed Collaborative Species”, Proceedings of the IEEE Symposium on Computations of Evolutionary
Computation and Neural Networks, San Antonio, TX, May 2000
33-El-Beltagy S., Rafea A., and Mohamed A. Sameh, “An Agent Based Approach to Expert System
Explanation”, Proceedings of the 12th International FLAIRS Conference, Orlando, Florida, 1999
34- Mohamed A. Sameh, Botros A. Kamal, "2D and 3D Fractal Rendering and Animation", Proceedings of
the Seventh Eurographics Workshop on Computer Animation and Simulation, Aug. 31st- Sept. 2nd, in
18
Poitiers, France, 1996
35-Mohamed A. Sameh, "A Robust Vision System for three Dimensional Facial Shape Acquisition,
Recognition, and Understanding", Proceedings of the 1st Golden West International Conference on
Intelligent Systems, Reno, Nevada, 1991
36-Mohamed A. Sameh, "A Neural Trees Architecture for Fast Control of Motion", Proceedings of the
FLAIRS Artificial Intelligence Conference, Cocoa Beach, Florida, 1991
37-Mohamed A. Sameh, Armstrong W.W., "Towards a Computational Theory for Motion Understanding:
The Expert Animator Model", Proceedings of the 4th International Conference on Artificial Intelligence for
Space Applications, Nasa, Huntsville, Alabama, 1988
CV of Mona Diab:
I am a scholar at Stanford University in the linguistics department working with Daniel Jurafsky and also part
of the Natural language Processing lab .
I finished my PhD in the University of Maryland, College Park, where I was in the linguistics department and
was part of the CLIP lab in the University of Maryland Institute of Advanced Computer Studies . I worked
under the supervision of a great advisor Philip Resnik. My thesis, defended in May 2003, is titled Word
Sense Disambiguation within a Multilingual Framework.
Earlier on, 1995-1997, I earned an MSc. degree in Artificial Intelligence (Machine Learning) from the
George Washington University under the supervision of Professor Peter Bock.
I worked in the Center for Spoken Language Research (CSLR) at the University of Colorado at Boulder for
five months as a research associate after graduation, then I moved to Stanford, California in January of 2004.
Here is my CV.
Research Interests
My main research area is statistical natural language processing. I am specifically involved
in computational semantics, Arabic computational linguistics, semantic processing and
machine learning.
I am interested in cross linguistic similarities and divergences in language use and how these types of
relations can be exploited to solve some of the language processing problems.
The NLaSP coll maybe checked here.
Publications






Diab, Mona. Relieving the data acquisition bottleneck for Word Sense Disambiguation.
Proceedings of ACL 2004.[pdf].
Diab, Mona, Kadri Hacioglu and Daniel Jurafsky. Automatic Tagging of Arabic Text: From raw
text to Base Phrase Chunks. Proceedings of HLT-NAACL 2004.[pdf].
Diab, Mona. An Unsupervised Approach for bootstrapping Arabic Sense Tagging. Proceedings of
Arabic Script Based Languages Workshop, Coling 2004.[pdf].
Diab, Mona and Philip Resnik, An Unsupervised Method for Word Sense Tagging using Parallel
Corpora, Proceedings of ACL, 2002.[ps].
Diab, Mona. An Unsupervised Method for Word Sense Tagging using Parallel Corpora: A
Preliminary Investigation. Special Interest Group in Lexical Semantics (SIGLEX) Workshop,
Association for Computational Linguistics, 2000.[pdf].
Diab, Mona and Steven Finch. A Statistical Word-Level Translation Model for Comparable
Corpora. Proc. of Conference on Content-based Multimedia Information Access (RIAO2000),
2000.[ps].
19





Resnik, Philip and Mona Diab, Measuring Verb Similarity, Cognitive Science Society
(CogSci2000), 2000.[pdf].
Dorr, Bonnie, Gina Levow, Douglas Oard, Philip Resnik, Amy Weinberg, Mona Diab, Maria
Katsova. MADLIBS: An Event Translingual Lexical Conceptual Structure Based Information
Retrieval System. North American Association for Computational Linguistics, NAACL 2000.
Resnik, Philip, Mari B. Olsen and Mona Diab, The Bible as a Parallel Corpus: Annotating the
`Book of 2000 Tongues', Computers and the Humanities, 33(1-2), 1999.
Diab, Mona, John Schuster and Peter Bock. A Preliminary Statistical Investigation into the impact
of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author
Classification, Proc. of 6th International Conference on Artificial Intelligence & Applications,
Egypt 1998 [ps].
Riopka, Terry, Mona Diab and Peter Bock. Quantifying and Interpreting the Effect of Intelligent
Information. Proc. of 6th International Conference on Artificial Intelligence & Applications, Egypt
1998 [ps].
Software
o
o
o
o
We have developed a set of Arabic processing tools in conjunction with our NAACL'04
[paper].
The tools utilize the Yamcha SVM tools to tokenize, POS tag and Base Phrase Chunk
Arabic text.
You may download our tarred and compressed (55mb) [package].
The tools are compiled for a linux platform. For questions or comments contact [me].
CV of Noureldean Soufian
Publication
·
·
·
·
·
·
·
·
·
·
Book: S. Noureddine: Conceptual Development and Quantitative Analysis of an Availability
Enhancing Middleware for Distributed Applications, Mensch & Buch Verlag, Berlin, 2002, ISBN 389820-347-6.
S. Noureddine: A Geometric Programming Approach for the Satisfiability Problem, submitted to
Comp. Intel. Studies, August, 2009.
S. Noureddine: Some Aspects of Islamic Logic, submitted to Applied Computing and Informatics,
KSA, August, 2009.
A Geometric Programming Approach for the Satisfiability Problem, submitted to Comp. Intel.
Studies, April, 2009.
M. Madi, S. Noureddine, A. Fellah: On Cryptology: Origin, Science, and Novel Techniques to
Interactive Data Decryption, First International Conference on Arab's & Muslim's History of
Sciences, UAE, 2008.
Fellah, S. Noureddine: Deterministic Timed AFA: A New Class of Timed Automata, Journal of
Computer Science, Science Publications, 2007.
S. Noureddine: Analysis of a New Reduction Calculus for the Satisfiability Problem, Proceedings
of the 9th ALC conference, 2006.
Fellah, S. Noureddine: Some Succinctness Properties of O-DTAFA, WSEAS Transactions on
Computers, 5(3), March, 2006.
Y. Chali, S. Noureddine: Document Clustering with Grouping and Chaining Algorithms, In Proc.
of the 2nd International Joint Conference on Natural Language Processing, South Korea, 2005.
Y. Chali, S. Noureddine: Text Clustering for Natural Language Applications, Journal of Computer
Science, Science Publications, 2005.
20
·
S. Noureddine: A Simple Reduction Calculus for Propositional Logic Formulas, 9th Asian Logic
Conference, Russia, August, 2005.
Publication
·
Book: S. Noureddine: Conceptual Development and Quantitative Analysis of an Availability
Enhancing Middleware for Distributed Applications, Mensch & Buch Verlag, Berlin, 2002, ISBN 389820-347-6.
·
S. Noureddine: A Geometric Programming Approach for the Satisfiability Problem, submitted to
Comp. Intel. Studies, August, 2009.
·
S. Noureddine: Some Aspects of Islamic Logic, submitted to Applied Computing and Informatics,
KSA, August, 2009.
·
A Geometric Programming Approach for the Satisfiability Problem, submitted to Comp. Intel.
Studies, April, 2009.
·
M. Madi, S. Noureddine, A. Fellah: On Cryptology: Origin, Science, and Novel Techniques to
Interactive Data Decryption, First International Conference on Arab's & Muslim's History of
Sciences, UAE, 2008.
·
Fellah, S. Noureddine: Deterministic Timed AFA: A New Class of Timed Automata, Journal of
Computer Science, Science Publications, 2007.
·
S. Noureddine: Analysis of a New Reduction Calculus for the Satisfiability Problem, Proceedings
of the 9th ALC conference, 2006.
·
Fellah, S. Noureddine: Some Succinctness Properties of O-DTAFA, WSEAS Transactions on
Computers, 5(3), March, 2006.
·
Y. Chali, S. Noureddine: Document Clustering with Grouping and Chaining Algorithms, In Proc.
of the 2nd International Joint Conference on Natural Language Processing, South Korea, 2005.
·
Y. Chali, S. Noureddine: Text Clustering for Natural Language Applications, Journal of Computer
Science, Science Publications, 2005.
·
S. Noureddine: A Simple Reduction Calculus for Propositional Logic Formulas, 9th Asian Logic
Conference, Russia, August, 2005.
CV Of Mohamed Tounsi
Dr. Mohamed Tounsi
Associate Professor in Computer Science
Specialization: Artificial Intelligence
Short Bio:
21
I.
Research interests
Mohamed Tounsi received his PhD in Computer Science specialization in artificial intelligence from
University of Nantes, FRANCE in 2002. He was the chairman of computer science department and
Assistant Professor at the Department of Computer Science, Prince Sultan University, KSA. His
current research interest includes constraint programming, meta-heuristics, bioinformatics,
intelligent argent and optimization algorithm. Previously, Dr. Tounsi received his master of science
from Paris 9, Dauphine University, Paris, FRANCE. Dr. Tounsi published several journals
publication is different international journals (see Research section). He is currently an editorial
member of various journals in the field of computing and he is a board member of Saudi Computer
Society.
Degrees



PhD in Computer Science, specialization in Artificial Intelligence, University of Nantes, France
2002
M.S. in Computer Science, specialization in Operational Research, University of Paris Dauphine,
France 1998
Engineer in Operation Research, University of Science and Technology Houari Boumedine,
Algiers, 1995
Constraint Programming and constraint satisfaction problems, Local Search Methods and
Hybrid Methods, Data Mining, Combinatorial Optimization, Multi-Objective Optimization,
Multi-Criteria Decision Making, Multi-agent modeling and parallel solving, Fuzzy Set
Theory,
II.
Current Projects
Data Mining applications in social networks
Data Mining applications in healthcare
Mining Arabic text
Small world based algorithms for optimization problems
Swarm intelligence for solving unconstrained optimization problems
III.
Publications
Recent Publications
1.
Mohamed Tounsi, (2010) “An intelligent bank assessment system: Preliminary Results”
International Journal of Electronic Finance, volume: 4 number: 03 Inderscience Publishing.
2.
Mohamed Tounsi (2010) “TTGENERATOR: An Intelligent Solver for Timetabling System”
Journal of Applied Soft Computing, Elsevier publishing (Accepted)
3.
Mohamed Tounsi (2010) “New Swarm Intelligence Based Heuristics” Journal of Applied Soft
Computing, Elsevier publishing (Accepted)
4.
Mohamed Tounsi et al.(2010) “A multi-criteria approach for job preferences” International
Journal of Data Analysis Techniques and Strategies (IJDATS)
22
5.
Mohamed Tounsi (2010) “a Multi-Objective Heuristics Based for Optimization Problems”
International Journal of Artificial Intelligence and Soft Computing, Inderscience
Publishing.(Accepted)
6.
Mohamed Tounsi et al. (2009) “The Role of BPR in the implementation of ERP Systems”,
International Journal of Business Process Management journal.
Vol. 15 No. 5. pp.:653-668.
Emerald Publishing.
7.
Mohamed Tounsi (2008), “An explanation-based tools for debugging constraint satisfaction
problems”. Journal of Applied Soft Computing, Elsevier publishing (Accepted). 8(4): 1400-1406
(2008)
8.
Mohamed Tounsi et al. (2008) “An Iterative local-search framework for solving constraint
satisfaction problem”. Journal of Applied Soft Computing, Elsevier publishing (Accepted) 8(4):
1530-1535 (2008)
9.
Mohamed Tounsi et al. (2008) “A Bluetooth intelligent e-healthcare system: analysis and design
issues”. International Journal of Mobile Computing (IJMC) 6(6): 683-695 (2008). Inderscience
Publishing.
10. Mohamed Tounsi (2008) “Toward a General Model for Local Search Technique” Journal of
Applied Computer Informatics. Vol 7 No 1. 2008.Elsevier Publishing.
11. Mohamed Tounsi et al. (2008) “The development of an intelligent Agent Prototype for Mutual
Fund Investment” International Journal of Electronic Finance. Vol 2 No. 3 pp.300-313. 2008.
Inderscience Publishing.
12. Mohamed Tounsi (2008) “An overview of ILOG Optimization Suite”, Journal of Applied
Computer Informatics. Vol 6 No 2. 2008.
13. Mohamed Tounsi et al.(2008) “Greedy-Based Approach for Solving Data Allocation Problem in a
Distributed Environment” . Proceedings of the International Conference on Parallel and
Distributed Processing Techniques and Applications, PDPTA 2008, Las Vegas, USA, July 14-17,
2008, 975-980.
14. Mohamed Tounsi (2008) “Intelligent System for Bank Assessment: A Preliminary Results” The
19th Saudi National Computer Conference (NCC19) .December 2008.
IV.
Research Activities
Member of Editorial Board for International Journal:
International Journal of Electronic Healthcare (IJEH), Inderscience Eds.
Business Process Management (BPMJ), Emerald Eds.
Applied Computing and Informatics (ACI), SCS Eds.
Reviewers of International Journals:
Applied Artificial Intelligence Journal
Applied Soft Computing Journal
23
Supercomputing Journal
Business Process Management Journal
New Mathematics and Natural Computation Journal (NMCJ),
Applied of Computer and Informatics
Reviewers of different International and National conferences
International Conference on Artificial Intelligence conference
International Conference on Parallel and Distributed Processing Techniques and
Applications PDPTA conference
IBAMA conference
IASTED Conferences (AIA, MSO)
ROADEF Conference (French conference of operational Research)
JNPC Conference (French conference of solving NP-complete problems)
Member of Scientific Committee of different conferences:
IASTED Conference, Artificial Intelligence and Applications (AIA), 2009
International Conference on Artificial Intelligence ICAI’2009, Las Vegas, USA
International Conference on Artificial Intelligence ICAI’2008, Las Vegas, USA
ISCAL 2007 Conference.
IASTED Conference, Artificial Intelligence and Applications (AIA), Innsbruck, Austria.
WESEAS Conference, Distance Learning and Web Engineering (DIWEB'2006), Lisbon, Portugal.
NCC18 (18th National Conference on Computer) Riyadh, Saudi Arabia.
International Conference on Artificial Intelligence ICAI’2006, Las Vegas, USA
IASTED Conference
International Conference on PARALLEL AND DISTRIBUTED COMPUTING AND NETWORKS-2005
Workshop Organized
Workshop at the 7th World Multi Conference on Systemic , Cybernetics and
Informatics (SCI 2003), Florida, USA
Member of Scientific Associations:
Board member of Saudi Computer Society (SCS)
French Association of Operations Research
Appendix B: Evaluations and Approvals
COLLEGE REVIEW COMMITTEE Evaluation and Recommendation
Excellent
Item/ Evaluation
Research methodology
Research objectives
24
Very
Good
Good
Weak
Research originality
Research contribution
Research applicability and relevance
Overall evaluation
Recommendations of College Committee
Approved
Amount of Budget Approved by College Committee:
Disapproved
(SAR)
Chair College Committee - Title and Full Name:
Signature:
Date:
Recommendations of the College Council
/
Approved
/
Disapproved
Dean of the College Council - Title and Full Name
Signature:
Date:
/
/
PSU INSTITUTIONAL RESEARCH COMMITTEE (IRC) Recommendation
Recommendation of the PSU IRC
Approved
Disapproved
Chair IRC Committee - Title and Full Name:
Signature:
Date:
25
/
/
PSU EXTERNAL REVIEW PANEL FOR RESEARCH PROPOSALS Recommendation
Recommendation of the Eternal Review Committee.
Approved: Amount of grant approved:
Disapproved:
Postponed:
Directed to:
Chair of External Review Panel - Title and Full Name:
Signature:
( SAR)
Date:
Recommendation of University Council
/
/
Approved
Signature:
Date:
26
Disapproved
/
/
Download