Paper Identification number

advertisement
Paper Identification number 329
1
Implementation of the ArabiQA Question
Answering System's Components
Yassine Benajiba, Paolo Rosso, Abdelouahid Lyhyaoui
Abstract—Most of the components of a Question Answering
(QA) system are language -dependent. For this reason, it is
really necessary to build such a system component by
component, taking into consideration each of the target
language peculiarities. In our case, we are interested in
building an Arabic QA system (ArabiQA). In previous works, we
have presented how we managed to implement an Arabicoriented Passage Retrieval (PR) system and a Named Entities
Recognition (NER) system for Arabic texts, two of the necessary
components to build ArabiQA. In this work we present the
different components that ArabiQA will contain making a
special focus on the Answer Extraction (AE) module. This
module is required to achieve high precision, because it is
responsible for extracting the different possible answers from
the relevant passages retrieved by the PR module and it
represents the last step before validating the extracted answers
and presenting them to the user. The obtained results show that
this approach can help significantly to tackle the AE task.
Index Terms—Answer Extraction, Arabic Natural Language
Processing, Question Answering.
I. INTRODUCTION: BACKGROUND AND MOTIVATIONS
I
NFORMATION Retrieval (IR) systems were thought and
designed to return relevant documents related to the user's
query. These systems had a great success because they
allowed the internet users to find the needed documents in a
fast and efficient way. However, IR systems are unable to
satisfy a special kind of users who are interested in obtaining a
simple answer to a specific question (with both the question
and the answer being formulated in natural language). The
The research work of the first author was supported partially by
MAEC - AECI. We would like to thank the PCI -AECI A/7067/06 and
MCyT TIN2006-15265 -C06-04 research projects for partially funding
this work.
Y. Benajiba is a Ph.D. Student (AECI grant holder) at the Dept. of
Informatic Systems and Computation of the Polytechnic University of
Valencia, Spain. e-mail: ybenajiba@dsic.upv.es
P. Rosso, Ph. D. is a member of the Natural Language Engineering
Laboratory and a permanent lecturer in the Dept. of Informatic Systems
and Computation of the Polytechnic University of Valencia, Spain. email: prosso@dsic.upv.es
A. Lyhyaoui, Ph. D. is a member of the Engineering Systems
Laboratory and a permanent lecturer in the Abdelmalek Essaadi
University, Tangier, Morocco. e-mail: lyhyaoui@gmail.com
research line aiming at satisfying this kind of users is the QA
task. The study of the QA task research guidelines [1] reported
that there are generally four kinds of questioners where each
type represents questions with a certain level of complexity.
The easiest questions are those concerning specific facts; for
instance: “Who is the king of Morocco?”, “In which city will
ICTIS'07 be held?”, etc.; and the hardest ones need a system
able to deduce and decide by itself the answer because the
question might be something like “Is there any evidence of the
existence of weapons of mass destruction in Iraq?”.
We have also conducted a general analysis of the CLEF1
(Cross Lingual Evaluation Forum) and the TREC2 (Text
Retrieval Conference) results. This general study shows that
up to now, only the two first questions levels have been
covered by the QA research community. The most successful
system in the French monolingual task of the CLEF 2006 used
intensive Natural Language Processing (NLP) techniques [2]
and obtained an accuracy of 68.95%. Other systems [3][4]
obtained an accuracy of 52.63% for the Spanish and the French
monolingual task without using any NLP techniques to make
their systems as independent of the language as possible. In
the TREC 2005 (the TREC 2006 proceedings have not been
published yet) the questions were harder to analyze because of
their division in groups, each group being related to a target
and, therefore, anaphora resolution was also needed. The top
accuracy of 53.4% [5] was obtained by means of a
sophisticated NER system and a statistical method for the
answer selection. In [6] an accuracy of 46.4% was obtained
using a dependency relation matching technique in the answer
extraction module.
The QA task is considered to be difficult both for design
and evaluation. The CLEF organizers offered an evaluation
platform for many languages but unfortunately the Arabic
language is not included among them. The non-existence of a
test-bed for Arabic language makes QA even more
challenging. However, there have been two attempts to build
Arabic QA systems oriented towards: (i) a structured
knowledge-based source of information [7]; and (ii)
unstructured data [8]. The test-bed of the second QA system
was composed of a set of articles of the Raya3 newspaper. In
the evaluation process, four Arabic native speakers were asked
to give the system 113 questions and judge the correctness of
1
2
3
http://www.clef-campaign.org
http://trec.nist.gov
http://www.raya.com
Paper Identification number 329
its answers manually. The reported results of precision and
recall were of 97,3%. These (possibly biased) results seem to
be very high if compared with those obtained before for other
languages in the CLEF 2006 and TREC 2005 competitions.
Unfortunately, the test-bed which was used by the authors is
not publicly available in order to compare the QA system with
it. In this paper we describe the components of the new
ArabiQA system we have been developing. The rest of this
paper is structured as follows: In the second section we
describe the generic architecture of the system. Special focus
is put on named entity recognition in Arabic in section Three.
The implementation of the passage retrieval module is
presented in section Four. Section Five describes our first
attempt for the answer extraction together with the results of
the preliminary experiments we carried out. Finally, we draw
some conclusions and we discuss the future work to be
undertaken, in order to fully implement the ArabiQA system.
II. A RABI QA GENERIC A RCHITECTURE
Fig. 1. ArabiQA generic architecture
The generic architecture illustrated in Figure 1 has been
adopted to design a QA system oriented to unstructured data.
From a general viewpoint, the system is composed of the
following components: (i) Question Analysis module: it
determines the type of the given question (in order to inform
the AE module about the expected type of answer), the
question keywords (used by the passage retrieval module as a
query) and the named entities appearing in the question (which
are very essential to validate the candidate answers); (ii)
Passage Retrieval module: it is the core module of the system.
It retrieves the passages which are estimated as relevant to
contain the answer (see section Four for more details); (iii)
Answer Extraction module: it extracts a list of candidate
answers from the relevant passages (see section Five for more
2
details); (iv) Answers Validation module: it estimates for each
of the candidate answers the probability of correctness and
ranks them from the most to the least probable correct ones.
The first, third and fourth modules need a reliable Named
Entities Recognition (NER) system. In our case, we have used
a NER system that we have designed ourselves [15] (see
section Three for more details).
III. NAMED ENTITES RECOGNISER
A NER system identifies proper names, temporal and
numeric expressions in an open-domain text. Such a system is
needed in many NLP tasks (IR, QA, Information Extraction,
clustering, etc...) because named entities represent for many
languages 10% of the available articles [15].
As we mentioned before (see Section 2), a reliable Arabic
NER system is needed for most of the components of
ArabiQA. Unfortunatly, all of the NER systems known in the
literature were built for commercial ends (Siraj4 by Sakhr,
ClearTags 5 by ClearForest, NetOwlExtractor6 by NetOwl and
InxightSmartDiscoveryEntityExtractor7 by Inxight). This
reason motivated us to build our own Arabic NER system.
The absence of capital letters in the Arabic language
makes the NER task even more challenging. To tackle this
problem we made a general study of the approaches which
proved to be successful for the NER task. In the languageindependent NER task of the conference CONLL 20028 the best
participations used a Maximum Entropy (ME) approach
[16][17][18][19]. A comparison between HMM and ME [20]
shows that the ME approach gives better results for the NER
task. Moreover, [21] in the NAACL/HLT 2004 9 reports a
language-independent NER system performing on English,
Chinese and Arabic texts using a ME approach reached a Fmeasure of 38,5 for the Arabic language. The corpora used for
the training and the test is held by the Language Data
Consortium10 (LDC).
Our Arabic NER system is also based on the ME
approach. This approach tackles the problem better than
others because of its features-based model. For training and
testing we have manually built our own corpora which are
composed of more than 150,000 tokens with an IOB2
annotation scheme [22]. These corpora are freely available on
our website11.
4
5
6
7
8
9
10
11
http://siraj.sakhr.com/
http://www.clearforest.com/index.asp
http://www.netowl.com/products/extractor.html
http://www.inxight.com/products/smartdiscovery/ee/index.php
http://www.cnts.ua.ac.be/conll2002/ner/
http://www1.cs.columbia.edu/~pablo/hlt-naacl04/
http://www.ldc.upenn.edu
http://www.dsic.upv.es/~ybenajiba
Paper Identification number 329
3
TABLE I
RESULTS OF OUR ARABIC NER SYSTEM
For the proper names recognition we have chosen the
exponential model where the probability of a word x of being of
a class c can be expressed as:
(1)
Location
Misc
Organisation
Person
Overall
Precision
Recall
F-measure
82.17%
61.54%
45.16%
54.21%
63.21%
78.42%
32.65%
31.04%
41.01%
49.04%
80.25
42.67
36.79
46.69
55.23
Z(x) is for normalization and may be expressed as:
(2)
Where c is the class, x is a context information and fi(x,c) is
the i-th feature. The features are binary functions indicating
how the different classes are related to one or many pieces of
information about the context in which the word x appeared.
The recognition of temporal and numeric expressions is
totally based on patterns and a small dictionary containing the
names of days and months in Arabic, as well as numbers
written in letters. A more detailed description of our Arabic
NER system goes beyond the scope of this paper. For further
information, please see [14].
IV. PASSAGE RETRIEVAL
The Passage Retrieval (PR) module is a core component in a
QA system. In [9] the authors report that the quality of a QA
system depends mainly on the quality of the PR module it
uses. In ArabiQA we have tuned the Java Information
Retrieval System12 (JIRS) in order to be able to process Arabic
text [10].
The JIRS is a multi-platform and language-independent
system. It is based on a density approach because it was
proved to be the most successful technique for the PR task
[11][12][13] (see Figure 2).
The JIRS uses a Distance Density model to compare the ngrams extracted from the question and the passage to
determine the relevant passages. The idea of this model is to
give more weight to those passages where the question terms
appear nearer to each other. In order to achieve this result JIRS
performs in two steps. In the first step, it searches the relevant
passages and assigns a weight to each one of them. The
weight of a passage depends mainly on the relevant question
terms appearing in the passage. The weight of a passage can
be expressed as:
(3)
Where n k is the number of passages in which the associated
term to the weight w k appears and N is the number of the
system passages. The second step uses a model which gives
more importance to passages where the question n-grams
appear near to each other. This model can be expressed as:
(4)
Where x is an n-gram of p formed by q terms, w i are the
weights defined by (3), h(x) can be defined as:
(5)
and d(x,x max) is the factor which expresses the distance
between the n-gram x and the n-gram with the maximum weight
x max , the formula expressing this factor is:
Fig. 2. JIRS Architecture
(6)
12
http://jirs.dsic.upv.es
Figure 3 shows an illustrating example where the first
Paper Identification number 329
passage is more relevant for JIRS than the second one because
the keywords capital and Morocco appear nearer to each
other (D=0 in the first passage whereas D=4 in the second
one).
Fig. 3. Illustrating example of the JIRS Distance Density model
We have tested JIRS using the same questions proportion
as in CLEF 2006 and preparing a test-set to make the test fully
automatic. The test-set consists of a 200 questions list, a snap
shot of the Arabic Wikipedia 13 (11,000 documents) and a list of
the correct answers (the elements of the test-set are all
available on our website14). After adapting the JIRS to the
Arabic language we obtained a coverage (ratio of the number
of the correct retrieved passages to the number of the correct
passages) of 69% and a redundancy (average of the number of
passages returned for a question) of 3.28. However, we have
only reached such coverage and redundancy after performing
a light stemming on all the elements of the test bed. This preprocessing operation was necessary because the Arabic
language is highly inflectional which generally causes a
significant data sparseness. In the next future we plan to make
experiments with JIRS on a larger corpus where an information
tends to be more redundant and with various light-stemmers in
order to investigate the best technique to tackle the data
sparseness mentioned above.
4
questions it is needed a semantic parsing to extracts the
correct answer [26][27][6]. Other approaches suggest using a
statistical method [28].
In this section, we describe an AE module oriented to Arabic
text for only factoid questions. Our system performs in two
main steps: (i) the NER system tags all the named entities (NE)
within the relevant passage; (ii) the system makes a preselection of the candidate answers eliminating NE which do
not correspond to the expected type of answer; (iii) the system
decides the final list of candidate answers by means of a set of
patterns. Figure 4 shows an illustrating example.
The test of the AE module has been done automatically by a
test-set that we have prepared specifically for this task. This
test-set consists of: (i) list of questions from different types;
(ii) list of question types which contains the type of each of
the test questions; (iii) list of relevant passages (we have
manually built a file containing a passage which contains the
correct answer for each question); (iv) and finally a list of
correct answers containing the correct answer for each
question. We have used manually selected relevant passages
in order to estimate the exact error rate of the AE module.
The measure we used to estimate the quality of performance
of our AE module is the precision (Number of correct Answers/
Number of Questions).
Using the method we described above we have reached a
precision of 83.3%.
V. A NSWER EXTRACTION
The AE task is defined as the search for candidate answers
within the relevant passages. The task has to take into
consideration the type of answers expected by the user [23],
and this means that the AE module should perform differently
for each type of question. Using a NER system together with
patterns seems to be a successful approach t o extract answers
for factoid questions [8][24][25]. However, for difficult
13
14
http://ar.wikipedia.org
http://www.dsic.upv.es/~ybenajiba
Fig. 4. Illustrating example of the Answer Extraction module's
performance steps
VI. CONCLUSION AND FUTURE WORK
In this paper, we have described the generic architecture
chosen for ArabiQA (the Arabic QA system that we have been
developing). We particularly focused on the first attempt to
implement a simple AE module dedicated especially to factoid
questions. In order to implement this module, we developed
Paper Identification number 329
an Arabic NER system and a set of patterns for each type of
questions (elaborated by hand). To make an automatic test we
have manually built a special test-set for this task. We have
reached a precision amounting to 83.3%, which shows that the
technique we used allows for the efficient extraction of a list of
possible answers to a factoid question. In the next future we
plan to improve this module in order to extract answers for
more difficult questions to analyse than the factoid ones. We
are not able to give the accuracy of the complete QA
answering system in this paper because some of its
components are still in the implementation phase.
The final goal is to complete the implementation of the
Question Analysis and Answer Validation components of the
ArabiQA system.
5
[11] C. Amaral, H. Figueira, A. Martins, P. Mendes, C. Pinto, Priberams
Question Answering System for Poteguese. In Working Notes for
the CLEF 2005 Workshop. In Accessing Multilingual Information
Repositories: 6th Workshop of the Cross-Language Evaluation
Forum, CLEF 2005, Revised Selected Paper. Vol. 4022 of Lecture
Notes in Computer Science, pp. 410-419. Springer, 2006.
[12] A. Ittycheriah, M. Franz, W.-J. Zhu, A. Ratnaparkhi, IBM’s
Statistical Question Answering System. In Proceedings of the Ninth
T ext Retrieval Conference (TREC-2002). pp. 229-234.
[13] G. G. Lee, J. Seo, S. Lee, H. Jung, B.-H. Cho, C. Lee, B.-K. Kwak, J.
Cha, D. Kim, J. An, H. Kim, K. Kim, SiteQ: Engineering high
performance QA system using lexico-semantic pattern matching
and shallow NLP. In Proceedings of the Tenth Text Retrieval
Conference (TREC-2002). pp. 422-451.
[14] Y. Benajiba, P. Rosso, J. M. Benedí. ANERsys: An Arabic Named
Entity Recognition System based on Maximum Entropy.
International Conference on Computational Linguistics and
Intelligent Text Processing, CICLing-2007, Springer -Verlag, LNCS
(4394), Mexico D.F., Mexico., pp. 143–153.
[15] N. Friburger, D. Maurel, Textual Similarity Based on Proper
REFERENCES
Names.(MFIR’2002) at the 25 th ACM SIGIR Conference,
Tampere, Finland, 2002, pp. 155 –167.
[16] O. Bender, F. J. Och, H. Ney, Maximum Entropy Models For
Named Entity Recognition. In Proceedings of CoNLL -2003.
Edmonton, Canada, 2003.
[1] J. Carbonell, D. Harman, E. Hovy, and S. Maiorano, J. Prange, and
[17] L. Chieu Hai, T. Ng Hwee, Named Entity Recognition with a
K. Sparck-Jones, Vision Statement to Guide Research in Question
& Answering (Q&A) and Text Summarization. Rapport technique,
NIST, 2000.
[2] D. Laurent, P. Séguéla, S. Négre, Cross Lingual Question Answer
ing using QRISTAL for CLEF 2006. Working Notes for the CLEF
2006 Workshop.
Maximum Entropy Approach. In Proceedings of CoNLL -2003.
Edmonton, Canada, 2003.
[18] JR. Curran, and S. Clark, Language Independent NER using a
[19]
[3] M. Pérez-Coutiño, M. Montes -y-Gómez, A. López-López, L.
Villaseñor-Pineda, A. Pancardo -Rodríguez, A Shallow Approach for
Answer Selection based on Dependency Trees and Term Density.
Working Notes for the CLEF 2006 Workshop.
[20]
[4] L. Gillard, L. Sitbon, E. Blaudez, P. Bellot, M. El-Béze, The LIA at
QA@CLEF-2006. Working Notes for the CLEF 2006 Workshop.
[5] S. Harabagiu, D. Moldovan, C. Clark, M. Bowden, A. Hickl, P.
Wang, Employing Two Question Answering Systems in TREC2005. In the Proceedings of the Fourteenth Text REtrieval
Conference, 2005.
[6] R. Sun, J. Jiang, Y. Fan Tan, H. Cui, T. Chua, M. Kan, Using
Syntactic and Semantic Relation Analysis in Question Answering.
In the Proceedings of the Fourteenth Text REtrieval Conference,
2005.
[7] F. A. Mohammed, K. Nasser, H. M. Harb, A knowledge based
Arabic question answering system (AQAS). ACM SIGART Bulletin,
pp. 21 -33, 1993.
[8] B. Hammou, H. Abu-salem, S. Lytinen, M. Evens, QARAB: A
question answering system to support the Arabic language. In the
Proceedings of the work shop on computational approaches to
Semitic languages, ACL , pages 55-65, Philadelphia, 2002.
[9] F. Llopis, J. L. Vicedo, A. Ferrandez, Passage Selection to Improve
Question Answering. In Proceedings of the COLING 2002
Workshop on Multilingual Summarization and Question Answering,
2002.
[10] Y. Benajiba, P.Rosso, Adapting the JIRS Passage Retrieval System
to the Arabic language. International Conference on
Computational Linguistics and Intelligent Text Processing,
CICLing-2007, Springer-Verlag, LNCS (4394), Mexico D.F.,
Mexico.
[21]
[22]
Maximum Entropy Tagger. In Proceedings of CoNLL -2003.
Edmonton, Canada, 2003.
S. Cucerzan and D. Yarowsky, Language Independent Named
Entity Recognition Combining Morphological and Contextual
Evidence. In Proceedings, 1999 Joint SIGDAT Conference on
Empirical Methods in NLP and Very Large Corpora, pp.90–99.
R. Malouf, Markov Models for Language-Independent Named
Entity Recognition. In Proceedings of CoNLL-2003. Edmonton,
Canada, 2003.
R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X.
Luo, N. Nicolov and S. Roukos, A Statistical Model for Multilingual
Entity Detection and Tracking. In Proceedings of NAACL/HLT,
2004.
A. Ratnaparkhi, Maximum Entropy Models for Natural Language
Ambiguity Resolution. PhD thesis, University of Pennsylvania
(1998).
[23] D. Moldovan, M. Pasca, S. Harabagiu and M. Surdeanu,
Performance issues and error analysis in an open-domain question
answering system. In Proceedings of the 40th Annual Meeting of
the Association for Computational Linguistics (ACL-2002), 2002.
[24] R. J. Cooper and S. M. Ruger, A Simple Question Answering
System . In the Proceedings of TREC-9, 2000.
[25] S. Abney, M. Collins and S. Amit, Answer Extraction. In the
Proceedings of ANLP 2000.
[26] D. Mollá and B. Hutchinson, Dependency-based semantic
interpretation for answer extraction. In Proceedings Australian NLP
Workshop, 2002.
[27] S. Harabagiu, D. Moldovan, C. Clark, M. Bowden, A. Hickl, and P.
Wang, Employing Two Question Answering Systems in TREC 2005.
In Proceedings of the Fourteenth Text REtrieval Conference,
2005.
[28] G. S. Mann, A Statistical Method for Short Answer Extraction. In
Proceedings of the ACL -2001 Workshop on Open-Domain
Question Answering, Toulouse, France, July.W .-K. Chen, Linear
Paper Identification number 329
Netwo rks and Systems (Book style). Belmont,
1993, pp. 123–135.
6
CA:
Wadsworth,
Download