Performance Analysis of Effective Japanese and Chinese Question Answering System

International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 Performance Analysis of Effective Japanese and Chinese Question Answering System Jaspreet Kaur#1, Vishal Gupta*2 # ME Student, Dept. of Computer Science and Engineering, Panjab University Chandigarh, India * Assistant Professor, Dept. of Computer Science and Engineering, Panjab University Chandigarh, India Abstract — In today’s world of technological advancement, Question Answering has emerged as the key area for the researchers. In Question Answering user is provided with specific answers instead of large number of documents or passages. Question Answering has been carried out in many languages. This paper compares some already existing Question Answering Systems for Chinese and Japanese languages. Different Chinese and Japanese Question Answering Systems are compared on the basis of their performance in different workshops held for Question Answering Systems. We also discuss the best approach out of all and the reasons for which it is considered good and some methods for further improvement in those techniques. Keywords — Question Answering, Performance, Evaluation, Algorithms. I. INTRODUCTION In this paper, we review previous works on Japanese and Chinese question answering systems and evaluate their performances and suggest different measures for further improvement. Comparing Chinese and Japanese with other languages, word segmentation is a key problem in Chinese and Japanese question answering. We review studies on different techniques used for Japanese and Chinese languages and discuss important issues which are helpful for building QA systems. Machine learning approaches currently represent the main stream on many QA research issues, we believe, by efficiently utilizing the above resources, the performance of machine learning approaches can be improved further in Chinese question answering. II. JAPANESE QUESTION ANSWERING SYSTEM USING A* SEARCH AND ITS IMPROVEMENT A. Method In Table 1 improvement on existing Japanese Question Answering System is described and its performance is shown. In this a method is proposed to introduce A* search control in a sentential matching mechanism for a Japanese QA system so that it can reduce turnaround time without affecting the accuracy. Several measures of the degree of sentence matching and pseudo voting method is proposed to improve the accuracy. Effectiveness of the newly introduced techniques is examined in an evaluation workshop for question answering systems. ISSN: 2231-5381 TABLE I. CHARACTERISTICS OF JAPANESE QUESTION ANSWERING SYSTEM Method Purpose Limitation A* search control in a sentential matching Reduce turnaround time while maintaining accuracy To improve accuracy of A* search control method Accuracy not sufficiently high. Several measures of degree of sentence matching and a variant of voting method were integrated with A* search control method Higher accuracy as compared to previous approach MRR(Mean Reciprocal Rank) 0.3 0.5 B. Details of Experimental System Experiments were conducted by the author in different conditions shown in Table 2 with the test collection of QAC2 Subtask 1. C. Improving Accuracy of Japanese Sentential Matcher In this several off-the-shelf NLP techniques are adopted that treat certain different aspects of sentence [6]. http://www.ijettjournal.org Page 643 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 TABLE II EXPERIMENTAL SYSTEM DETAILS Morphological Analyzer Dependency Analyzer NE recognizer JUMAN 3.61 [1] KNP 2.0b6 [2] SVM-based NE recognizer [3] ] using SVMlight [4] Numerical expression extractor Document database (knowledge resource) Size: (774 MByte) System by Fujihata et al. [5] Computer Language of Implementation Mainichi Shimbun newspaper articles in 1998 and 1999 Yomiuri Shimbun newspaper articles in 1998 and 1999 CPU: Xeon (2.2 GHz) × 2, Main memory:4 GByte (for QA server) CPU: UltraSPARC III Cu (900 MHz) × 2, Main memory:8 G Byte (for search engine) JPerl 5.005_03 Composite matching score is constructed as shown in Eq. (1), which is a linear combination of the following sub score for an answer candidate AC in the ith retrieved sentence Li with respect to a question sentence Lq having an interrogative Q. S AC, L , L = Sb AC, L , L + Sk AC, L , L + St AC, L , L (1) Sb(AC, Li , Lq), Sk(AC, Li , Lq), and Sd(AC, Li , Lq) are scores for measuring the similarity between the context of an interrogative Q in a question sentence Lq and that of an answer candidate AC in a retrieved sentence Li . On the other hand, St(AC, Li , Lq) is a score for representing the consistency between the answer candidate AC and a question type expressed by the interrogative Q. 1. Sentence Chaining A method is proposed in which multiple sentences are treated as a single sentence [6]. In this method, each parse tree for sentences L0 . . . Ln in an extracted passage is connected to the parse tree of the succeeding sentence, if the following condition with respect to the question sentence Lq is satisfied: KW l ∩ ⋀ KW l KW(L ) ⊈ KW L ∩⋃ ∩ KW(L ) KW(L ) ⊉ KW L ∩ KW(L ) (2) Where KW(Li ) is the set of keywords appearing in Li . The condition holds when Ln has new keywords that do not appear in the sequence of sentences L0 . . . Ln-1, and vice versa. Instinctively, when keywords in the question sentence are scattered over a series of sentences, the sentences are connected. This method is known as sentence chaining. 2. Matching in Terms of Keywords ISSN: 2231-5381 The matching score in terms of keywords is calculated based on the number of keywords shared by a question and a retrieved sentence. Equation (3) defines the score Sk(AC, Li , Lq) for an answer candidate AC in a retrieved sentence Li with respect to a question Lq [6]. Sk AC, L , L = C ∑ ∈ w(k) + C ∑⟨ , , , ⟩⊂ ( , , ) w(k) SKW AC, L , L = (KW(L )\{AC}) ∩ KW(L ) SKWC AC, L , L = (KW (L )\{⟨AC, c ⟩}) ∩ KW (3) (L ) Where the function K (L) returns a set of (keyword, case marker) pairs in the sentence L. The constants Ck and Cc are mixing factors in the composite score of (1). cAC is the case marker of the AC. The functions SKW (AC, Li , Lq) and SKWC(AC, Li , Lq) return the set of keywords and the set of (keyword, case marker) pairs shared by the question Lq and the retrieved sentence Li. Function w (k) is a weighting function of keyword k, according to a certain global weighting method. 3. Matching in Terms of 2-Grams According to the no. of character 2-grams shared by a retrieved sentence and a question, the matching score in terms of 2-grams is calculated. Sb AC, L , L =C ⊂ ( ) = Sb1 s, e, L , L ( ) ∑ ∑ bfreq(L , L , lj) w(k). len(AC) bfreq L , L , l, j = (j ≠ l). freq(substr(L , j, j + 1), L ) Where s and e are the character positions of the start and end of an AC in Li , respectively, and the constant Cb is a mixing factor in (1). 4. Matching in Terms of Dependency Structure The first element of the dependency vector DV (AC, Ki ) is the distance between AC and LCA (the lowest common ancestor of AC and Ki ) . The second element is the distance between Ki and LCA. The distance is defined as the number of edges between two nodes. Similarly, dependency vector DV (Q, Ki ) for a question sentence is obtained, where Q is an interrogative in the sentence. Similarity between dependency structures of two sentences is calculated using dependency vectors. First, the similarity between the dependency relation of a Q to a keyword Ki in the question sentence Lq and that of the AC to Ki in the retrieved sentence Lj can be regarded as measures of the appropriateness of AC. The appropriateness can be represented as a function of the distance between two dependency vectors DV(Q, Ki ) and DV(AC, Ki ). Sd (AC, Q, K ) = (4) ( , )| | ( , ) Second, the nearness between an answer candidate and keywords can also be regarded as a measure of the http://www.ijettjournal.org Page 644 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 appropriateness of the AC, which can be calculated as the length of the dependency vector. Sd (AC, Q, K ) = (5) Considering these two measures of structural similarity, a measure of the degree of matching between the dependency structures of the AC in the retrieved sentence Lj and that of Q in the question sentence Lq is defined by Eq. (6): Sd AC, L , L / (| =C ∑ Sd1(AC, Q, K ) = Sd (Sd1(AC, Q, K ) + Sd ( ( , , , )| | ( , |) ) Sd1(AC, Q, K ) Sd1(AC, Q, K ) . w(K ) (6) (7) 5. Matching in Terms of Question Type The matching score St(AC, Li , Lq) in terms of the question type is calculated as follows. First, a question type is estimated by the type of the interrogative and other clue expressions. Second, when the question type is supported by numerical expression extractor, a set of patterns is used to filter answer candidates. Each retrieved sentence Lj with a suitable numeric is then passed to the extractor to obtain a triplet (Objj , Attr j , ACj (=numeric + unit). The matching score for an answer candidate is defined in Eq. (8) based on the similarity of surface expression between the triplet (Objj , Attr j , ACj ) and the triplet St AC, L , L = C . R Obj , Attr , Obj , Attr +(1 − α )) ∑ ∈ ( , , ) w(K ) (8) Therefore, the system may find other answer candidates that have the same surface expression as one of the answer candidates that have already reached the goal state [9]. Consequently, the frequency information of answer candidates can be partially used by recording all that have reached the goal state in the search process [10]. Here, the pseudo voting score Sv(AC, Lq) for an answer candidate AC is as follows: S AC, L = (log (freq(AC, AnsList))) + 1. max S( AC, L , L ) (10) where AnsList is the list of answer candidates that have reached the goal state. E. Experimental Result The performance of the systems was evaluated using the MRR (Mean Reciprocal Rank).Following experiments were conducted [12]: 1. Experiment 1: Evaluation of Performance with Respect to System Parameters In this experiment, the MRR and turnaround time by varying the values of the system parameters is examined as shown in Table 3. The result is shown in Figure 2 [6]. TABLE III DESCRIPTION OF SYSTEM PARAMETERS a: Number of answers to be searched d: Number of documents to be retrieved ppd: Maximum number of passages retrieved from one document p: Number of passages to be considered in the retrieved documents R Obj , Attr , Obj , Attr 1 = shared_char Objj , Objq . +shared_char Attr , Attr . len Objj Two lines for the turnaround time have been plotted. The first is the turnaround time that includes the processing time of an external search engine, while the second excludes the processing time of an external search engine. 1 + len Objq 2 ( ) ( ) (9) 1. Where the function shared_char(s1, s2) returns the number of characters shared by both strings s1 and s2. Therefore, the function Rs (Objj , Attr j , Objq, Attrq) represents the similarity of objects and attributes between two triplets in terms of the surface expression. The parameter αnum (0 ≤ αnum ≤ 1) controls the extent to which the result of the numerical expression extractor is considered. The constant Cnum is a mixing factor in Eq. (1). Third, if the question type belongs to any other simple numeric expression such as DATE, a set of patterns is applied to retrieved sentences to filter out answer candidates that are not numeric or do not have suitable unit expressions. Experiment 2: Evaluation of Effectiveness of Proposed Matching Techniques In order to evaluate the effectiveness of these matching techniques, systems are prepared in which one function is suppressed, and examined (1) their accuracy on the basis of MRR, (2) the average precision of the first answer candidate, and (3) the ratio of the number of questions whose answers are found within the top five answer candidates. The baseline adopts the scoring function Sdꞌ(AC, Li , Lq) defined in Eq. (11) instead of matching score Sd(AC, Li , Lq) for the dependency structure, but does not use the pseudo voting method: Sd′ AC, L , L D. Pseudo Voting Method in Search Scheme This method continues searching for answers until n different answer candidates are found in case n-best answers are found [8]. In other words, n different answer candidates reach the goal state, where all of sub scores are actually calculated. ISSN: 2231-5381 = C′ ∑ ( ) ( , ) (11) where the function Dist (AC, k) returns the distance between AC and k, that is, the number of morphemes between AC and k plus one. The result of comparison is shown in Figure 3, where the parameter setting is a = 5/d = 250/ppd = 5/p = 50, which achieves the best performance in the experiment. http://www.ijettjournal.org ∈ ( , , ) Page 645 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 Figure 1 show that each proposed matching technique has an 1. Experiment 3: Evaluation of Effectiveness of Proposed Search Control Following systems are compared in terms of the MRR value and the average processing time to search one or five (different) answer candidates. The result is shown in Figure 2, where the parameter setting is a = 1 or 5/d = 250/ppd = 5/p = 50. effect on improving accuracy. III. AN ANALYSIS OF A HIGH-PERFORMANCE JAPANESE QUESTION ANSWERING SYSTEM A. Method This question answering system, SAIQA-QAC2 (System for Advanced Interactive Question Answering), achieved the best performance of MRR = 0.607 in the subtask. SAIQA-QAC2 is an improvement of our previous system SAIQA-Ii that achieved MRR = 0.46 for QAC1 subtask 1 at NTCIR- QAC In Table 4 improvement on existing Japanese Question Answering System is described and its performance is shown. TABLE IV FEATURES OF A HIGH PERFORMANCE JAPANESE QUESTION ANSWERING SYSTEM Method Purpose Limitation MRR(Mea n Reciprocal Rank) Docume nt Databas e SAIQAQAC2 (System for Advanced Interactiv e Question Answerin g) -new proximity based ranking method DIDF is introduce d. -DIDF is compared with other document retrieval methods, namely, IDF, BM25 and MultiTex t. -Main part of the experimental system is implemented in Perl , which is a script language and reimplementati on in other programming language that generate native code would be effective. -NE rcognizer used is very time consuming and needs improvement 0.516 Mainichi Shimbun newspap er articles in 1998 and 1999 -Yomiuri Shimbun newspap er articles in 1998 and 1999 (size= 774 MByte) Fig. 1. Evaluation of effectiveness of proposed matching techniques. Fig. 2. Comparison of Search Control Methods in Terms of the Average Processing Time of Matching Score and MRR. Figure 2 shows that the proposed system [A∗ (Approx. + Max.)] is 12.0 times (for one candidate) and 5.8 times (for five different candidates) faster than a system with no search control. ISSN: 2231-5381 This question answering system, SAIQA-QAC2 (System for Advanced Interactive Question Answering), achieved the best performance of MRR = 0.607 in the subtask. SAIQA-QAC2 is an improvement of our previous system SAIQA-Ii that achieved MRR = 0.46 for QAC1 subtask 1 at NTCIR- QAC [13]. The mainstream of the system is composed of four modules: Question Analysis, Document Retrieval, AnswerExtraction, and Answer Evaluation. IREXNE90 is an efficient SVM-based named entity recognizer that achieved F=90% [15] for IREX general task[16]. The same recognizer trained only by a publicly available corpus (CRL NE data1) achieved F = 85%. This version is IREXNE85. IREXNE90 detects named entities and classifies them into eight classes: ORGANIZATION, http://www.ijettjournal.org Page 646 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 PERSON, LOCATION, ARTIFACT, DATE, TIME, MONEY, and PERCENT. In general, fine-grained answer taxonomy improves MRR [17].. B. Description of Each Module 1. Question Analysis The Question-Analysis Module normalizes questions and determines expected answer types. Question normalization is used to simplify “answer-type determination rules.” After the normalization, ALTJAWS, a morphological analyzer based on a Japanese lexicon “Nihongo Goi-taikei” [18] is applied, to segment a given question into words, because inter word spaces are not used in Japanese. 2. Document Retrieval A new proximity-based document-ranking method is developed that considers all passages of different lengths. With this method, document D’s score is defined as the best score of all passages in D. DS ( ) (D) = max PS ( ) (p) ⊆ Here, p1 ⊆ p2 means that a passage p1 is contained in another passage p2. The document D is also regarded as a passage. In this paper, passage p is represented by a pair of integers [l , r] where l is the location of p’s first word and r is the location of p’s last word. The location of a word equals the number of words before the word. The Passage score is given by following definition. PS ( ) ([l, r]) = exp(−β(r − l)) idf[q] 4. Answer Evaluation A simple scoring function is used [21] [22]. Candidate c in a document D is evaluated by a weighted Hanning window function [23] defined as follows: score(c, D) = DS(D) × Where DS (D) is D’s document score, d(c, q) is the distance between c, and the nearest position of a query term q, and 1 πd H(d) = 2 cos W + 1 if0 ≤ d ≤ W 0otherwise C. Contribution of Each Component Each component contribution to the performance is analyzed. 1. Contribution of Answer-Extraction sub modules Table 5 shows the degree to which each sub module of the Answer-Extraction Module contributed to the performance of the entire QA system. According to this table, IREXNE90, NUMEXP, and WSENSE were main contributors to the system performance [14]. TABLE V CONTRIBUTION OF SUB MODULES IN THE ANSWER-EXTRACTION MODULE Wind ow Size W= 30 Bunsetsus ∈ ([ , ]) Where idf [q] = log(N/df [q]) is IDF (Inverse Document Frequency) of query term q. df [q] is the number of documents that contains q and N is the total number of documents. Q([l , r]) is the set of query terms that appear in the passage [l , r]. β ≥ 0 is a decay factor. If β is large, the scoring function becomes nearsighted. Long passages do not produce good scores. If β is small, the scoring function becomes farsighted. Therefore, this method is known as DIDF (Decayed IDF). An efficient algorithm is used for the document ranking. DIDF(0) is equivalent to an IDF-based document scoring function: DS (D) = idf[q] ∈ ( ) By using 2000 in-house questions and about 1000 QAC1 questions, it is found that β’s optimal value lies somewhere between 0.005 and 0.0001, but there was not enough time to tune the value before the formal run of QAC2. β = 0.005 is used for the formal run. 3. Answer Extraction Here, sub modules of the Answer-Extraction Module are described. As mentioned above, IREXNE90 generates answer candidates and REJECT sub module removes inappropriate candidates of a certain fine-grained answer type. For example, school names and hospital names are recognized by IREXNE90 as LOCATION or ORGANIZATION, depending on the context. ISSN: 2231-5381 w[q]H(d(c, q)) ∈ IREX NE85 IREX NE90 +NU MEX P +SUF FIX +WSE NSE +DEF INE +REJ ECT (CoarseGrained) W= 60 Words Top 5 Di ff MR R To p5 D if f MR R To p5 D iff MR R 91 + 91 0.37 94 + 94 0.38 91 + 91 0.38 100 +9 0.40 103 + 9 0.41 100 +9 0.43 114 + 14 0.47 117 + 14 0.48 114 + 14 0.49 120 +6 0.50 123 + 6 0.51 121 +7 0.52 135 + 15 0.57 132 + 9 0.55 134 + 13 0.58 138 +3 0.58 135 + 3 0.56 137 +3 0.59 138 +0 0.61 134 -1 0.57 139 +2 0.62 2. Contribution of the Proximity-Based Document Retrieval Module Table 6 compares the precision of DIDF with that of baseline methods. The precision at rank R is the ratio of relevant documents in the top R documents. It is assumed that a relevant document is a document that contains a correct answer. That is, lenient evaluation is used. http://www.ijettjournal.org Page 647 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 TABLE VI COMPARISON OF PRECISION OF DIFFERENT DOCUMENT RETRIEVAL METHODS Rank DIDF (0.005) DIDF (0.001) MultiText IDF BM25 5 0.609 10 0.524 20 0.448 30 0.407 40 0.379 50 0.358 0.613 0.543 0.475 0.429 0.396 0.371 0.604 0.526 0.444 0.398 0.369 0.347 0.579 0.579 0.515 0.503 0.448 0.428 0.413 0.390 0.383 0.364 0.365 0.344 =1.2 BM25 0.626 0.541 0.462 0.419 0.389 0.369 =0.1 BM25 0.575 0.507 0.442 0.406 0.378 0.359 =0.0 . Table 7 compares MRRs given by different retrieval systems. Top 20 documents were used. According to this table, DIDF gives better scores than MultiText, BM25, or IDF. However, even the difference between BM25 (k1 = 1.2) and DIDF(0.001) was not statistically significant when RRs were compared. TABLE VII COMPARISON OF MRR FOR DIFFERENT RETRIEVAL SYSTEMS MRR Top5 BM25 BM25 =1.2 0.580 133 =0.1 0.601 139 MULT TI -TEXT 0.626 139 DIDF = 0.005 0.622 139 DIDF = 0.001 0.628 141 IDF 0.617 139 IV. BOOSTING CHINESE QUESTION ANSWERING WITH TWO LIGHTWEIGHT METHODS: ABSPS AND SCO-QAT A. Method Experiments in this article were conducted on a host QA system, ASQA (“Academia Sinica Question Answering system”), which was developed to deal with Chinese related QA tasks. The system participated in the CLQA C-C (Chineseto-Chinese) subtasks at NTCIR-5 and NTCIR-6, and achieved state-of-the-art performances in both cases. Questions are first analyzed by the question processing module to get keywords, named entities (NEs), and the question type. Then, queries are constructed for passage retrieval according to the question processing results. In the next phase, answer extraction is performed on the retrieved passages to obtain candidate answers, which are then filtered and ranked by the answer filtering module and answer ranking module, respectively. B. PROPOSED METHODS ISSN: 2231-5381 1. ABSPs- Alignment-Based Surface Patterns In ASQA, ABSPs are used in an answer filter to confidently identify correct answers [25] [26]. 1.1 The Alignment Algorithm Pair-wise sequence alignment (PSA) algorithms that generate templates and match them against new text have been researched extensively [28]. Because surface patterns extracted from sentences are needed that have certain morphological similarities, local alignment techniques are employed to generate surface patterns [27]. To apply the alignment algorithm, first word segmentation is performed. In the following discussion each unit is a word. Templates contain named entity (NE) as semantic tag, and POS as syntactic tag. Consider two sequences X = (x1, x2 . . . xn) and Y = (y1, y2 . . . ym) defined over the alphabet P that consists of four kinds of tags: NE tags, POS tags, a raw word tag for every single word, and a tag “-” for a gap. We assign a scoring function, F, to measure the similarity of X and Y. F(i, j) is defined as the score of the optimal alignment between the initial segment from x1 to xi of X and the initial segment from y1 to yj of Y. F(i, j) is recursively calculated as follows [24]: F(i, 0) = 0. F(0, j) = 0, xi, yj ∈ ∑, 0, ⎧F(i − 1, j − 1) + d(xi, yj) F(i, j) = max ′ ⎨ F(i − 1, j) + d(xi, ′− ) ′ ⎩ F(i, j − 1) + d(′− , yj) (12) (13) Where d(a, b) is the function that determines the degree of similarity between two alphabet letters a and b. The function is defined as Where NE(a) denotes the Named Entity (NE) tag of a, and POS(a) denotes POS tag of a. 1, a = b 1, NE(a) = NE(b) 1, POS(a) ≈ POS(b) d(a, b) = max ⎨1 − penality, POS(a) ≈ POS(b) ⎪ ⎩ 0, a ≠ b ⎧ ⎪ (14) 1.2 ABSP Generation An ABSP is composed of ordered slots. From a set of sentences by applying the alignment algorithm, ABSPs can be generated. Before alignment, the sentences are segmented and tagged with POS by a Chinese segmentation tool, AutoTag10. In addition, the sentences are tagged with semantic tags. NER engine is used to label PERSON, ORGANIZATION, LOCATION, and TIME tags, and a word list for “occupation” tags. After this step, the remaining words without any semantic tag are tagged “O”. Thus, every segment of a sentence contains a word, a POS tag and a semantic tag in the format: “word/POS tag/semantic tag”. The complete ABSP generation algorithm is detailed in Algorithm 1 [24]. http://www.ijettjournal.org Page 648 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 1.2.1 Algorithm 1. ABSP Generation Input: Question set S = {s1…sn}, Output: A set of uncategorized ABSPs T = {tn….tk} Comment: Perform pair alignment for every two questions 1. T = {}; 2. For each question si from s1 to sn-1 do 3. For each question sj from si to sn do 4. Perform alignment on si and sj, then 5. Pair segments according to similarity matrix F; 6. Generate a common ABSP t from the aligned pairs with the maximum similarity; 7. T ← T∪t; 8. End; 9. End; 10. Return T; 1.3 ABSPs Selection The selection process chooses patterns that can connect question keywords and the answer.. Each generated ABSP is applied to its source passages. When a matched source passage is found, corresponding terms are extracted from the important slots. If the extracted terms do not contain the answer and any of the important terms of the source question, the ABSP is removed. The detail is described in Algorithm 2 [24]. ABSPs are applied as a filter to choose highly confident answers. It is assumed that words matched by an ABSP have certain relations between them. When a pattern matches the words, a relation is identified and a Related-Terms-Set (RTS) is constructed which contains the related terms. If an ABSP matches a passage, an RTS is extracted, which is comprised of the matched important terms. More RTS are constructed if more than one ABSP matches different terms in a passage. If the RTS contains common elements (i.e., the same term is matched by at least two ABSPs,), the idf value of those elements is checked. If one idf value is higher than a threshold value, the two RTSs are merged. 1.3.1 Algorithm 2. ABSPs Selection Input: A set of ABSPs T = {t1,…,tk} for selection, the source question Q, the answer A, the source passages S = {s1,…,sn}. Output: Selected set of ABSPs T' = {t1,….,ti}. 1. T' = {}; 2. QTs ← extract important terms from Q 3. for each sentence si in S do 4. for each ABSP tj in T do 5. perform pattern matching on si with ti, if match then 6. PTs ← extract terms that match with important slots of tj from si 7. if PTs contains A and any term in QTs then 8. T' ← T'∪tj; 9. end if; 10. end if; 11. end; 12. end; 13. return T'; 1.4 Relation Extraction and Score Calculation ISSN: 2231-5381 After all the RTSs for the given question have been constructed, the question’s important terms are used to calculate an RTS score. The score is calculated as the ratio of the question’s important terms to the matched important terms. For RTSs that do not contain any of the question’s important terms, the candidate answers they contain are discarded. If none of the RTSs contains a question’s important terms, then it is said that the question is not covered; and since no useful relations for filtering answers could be found, all the answers are retained. After processing all the sentences selected for a question, the candidate answers are ranked by the sum of their RTS scores for the sentences in which they appear and retain the top-ranked answer(s) [24]. C. SCO-QAT: Sum of Co-occurrences of Question and Answer Terms Passages are chosen instead of documents, because it is assumed that co-occurrence information provided by passages is more reliable. The SCO-QAT formula is deduced from an expected confidence score concept. Let the given answer be A and the given question be Q, where Q consists of a set QT of question terms {qt1, qt2, qt3, . . . . . . , qtn}. Based on QT, QC is defined as a set of question term combinations, or more precisely QC = {qci | qci is a subset of QT and qci is not empty}. The co-occurrence confidence score of an answer A with a question term combination qci is calculated as follows [24]: ( Conf(qc , A) = , ) , iffreq(qc ) ≠ 0 0, iffreq(qc ) = 0 ( ) (15) Where freq(X) is the number of retrieved passages in which all the elements of X co-occur. The expected confidence score is defined as ∑ | | | | Conf(qc , A) = | | ∑ | | Conf(qc , A) (16) Because |QC| is the same for every answer, it can be removed. As a result, there is following formula for SCO-QAT: SCO − QAT(A) = ∑ | | Conf(qc , A) (17) Candidate answers are ranked according to their SCO-QAT scores. Equation (17) is used to calculate the candidate answer’s SCO-QAT score as follows: freq(qt1, c1) freq(qt2, c1) freq(qt3, c1) + + freq(qt1) freq(qt2) freq(qt3) freq(qt1, qt2, c1) + freq(qt1, qt2) freq(qt1, qt3, c1) freq(qt2, qt3, c1) + + freq(qt1, qt3) freq(qt2, qt3) freq(qt1, qt2, qt3, c1) + freq(qt1, qt2, qt3) SCO − QAT(c1) = http://www.ijettjournal.org Page 649 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 D. Enhancing SCO-QAT with Distance Information SCO-QAT is improved by integrating distance information to achieve the term density in passages when the number of question terms is small. The following is the extended SCOQAT formula: Conf ( ( = , ) ) ∑ ( , , ) , iffreq(qc ) ≠ 0 0,iffreq(qc ) = 0 SCO − QAT ∑ ∑ | | | | ( ) = Conf(qc , A), if|QT| > threshold Conf ( , ) , (18) (19) if|QT| < threshold where n denotes the number of retrieved passages. If the passage does not contain qci, confidence value is set to 0. As shown in the modified SCOQAT function in Equation (19), when the number of question terms is smaller than a threshold then we switch to Conf_dist. The avgdist function is the average number of characters between the question term combination qci and the answer A in passage pj, which is calculated as: avgdist p , qc , A = ∑ ( , , ) ∈ | | E. Experiments Three experiments were conducted. 1 Comparing SCO-QAT with other Ranking features As a replacement of combined features, only the consequence of single ranking features is examined. It is supposed that they are more consistent and can be applied to other systems more easily. In addition to SCO-QAT, following widely used shallow features were tested: density, keyword overlap, IR score, mutual information score, and answer frequency. The keyword overlap is the ratio of question keywords found in a paragraph. The IR score is calculated by the Lucene information retrieval engine. There are several methods to calculate density [29]. The mutual information score is computed by the PMI method [30] [31]. The experiment results are shown in Table 8. TABLE VIII THE PERFORMANCE OF SINGLE FEATURES: “ACCURACY” IS THE RU-ACCURACY, “MRR” IS THE TOP5 RU-MEAN-RECIPROCALRANK, AND “EAA” IS THE EXPECTED ANSWER ACCURACY Feature SCOQAT KO Density Frequency IR MI Data: NTCR5-CC-D200e Accuracy EAA 0.545 0.621 0.515 0.601 0.375 0.501 0.445 0.560 0.515 0.598 0.210 0.342 ISSN: 2231-5381 MRR 0.522 0.254 0.368 0.431 0.425 0.210 Data: IASL-CC-Q465 Accuracy EAA 0.578 0.628 0.568 0.618 0.432 0.519 0.413 0.486 0.518 0.587 0.138 0.280 Data: NTCIR5-CC-T200e Accuracy EAA 0.515 0.586 0.495 0.569 0.390 0.479 0.395 0.499 0.495 0.569 0.155 0.138 Data: NTCIR6-C-T150 Accuracy EAA 0.413 0.495 0.367 0.476 0.340 0.420 0.340 0.431 0.367 0.460 0.167 0.281 Feature SCOQAT KO Density Frequency IR MI Feature SCOQAT KO Density Frequency IR MI Feature SCOQAT KO Density Frequency IR MI MRR 0.546 0.247 0.369 0.406 0.406 0.124 MRR 0.515 0.245 0.380 0.366 0.420 0.290 MRR 0.406 0.130 0.314 0.343 0.283 0.142 1 Enhancing SCO-QAT with Distance Information Experiments were carried out on the extended version of SCO-QAT with the question-term-number threshold in Equation (19) set to 5. The results are shown in Table 9. According to paired t-test, SCO-QAT with distance was significantly more correct than SCO-QAT at the 0.01 level [24]. TABLE IX THE PERFORMANCE OF SCO-QAT AND SCO-QAT WITH DISTANCE INFORMATION: “ACCURACY” IS THE RU-ACCURACY, “MRR” IS THE TOP5 RU-MEAN-RECIPROCAL-RANK, AND “EAA” IS THE EXPECTED ANSWER ACCURACY Feature SCOQAT SCOQAT_Dist Feature SCOQAT SCOQAT_Dist Feature SCOQAT SCOQAT_Dist Feature SCOQAT SCOQAT_Dist 2 http://www.ijettjournal.org Data: NTCR5-CC-D200e Accuracy EAA 0.545 0.522 0.570 0.568 Data: IASL-CC-Q465 Accuracy EAA 0.578 0.546 0.589 0.565 Data: NTCIR5-CC-T200e Accuracy EAA 0.515 0.515 0.535 0.538 Data: NTCIR6-C-T150 Accuracy EAA 0.413 0.406 0.453 0.449 MRR 0.621 0.643 MRR 0.628 0.637 MRR 0.586 0.597 MRR 0.495 0.565 ABSP- Based Answer Filter Page 650 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 When the ABSP-based answer filter was used in ASQA for the NTCIR-6 dataset, the RU-accuracy increased from 0.453 to 0.5. V. PERFORMANCE ANALYSIS OF DESCRIBED QUESTION ANSWERING SYSTEMS Following Figure 3 describes the performance of different Question Answering systems described above. in the answer-type determination module and the retrieval module. A new proximity based document retrieval module DIDF is developed that performs better than other document retrieval modules. In experiments, only newspaper articles were used. Additional experiments are required to show the generality of scoring function. Languages and the number of documents may influence the performance. In Japanese Question Answering System Using A* Search and its Improvement, which achieved the MRR = 0.516. From the viewpoint of computational cost, the absolute average processing time should be reduced to a greater extent, although this controlled search successfully manages the trade off between computational cost and accuracy. One of the reasons is that the main part of the experimental system is implemented in Perl, which is a script language. The reimplementation in other programming languages that generate native code would be effective. Introduction of a caching mechanism to reuse the result of sub processes would also achieve good results, because some sub processes, such as the NE recognizer used in this study, are very time consuming. These challenges need to be explored. Boosting Chinese Question Answering with Two Lightweight Methods: ABSPs (Alignment-based Surface Patterns) and SCO-QAT (Sum of Co-occurrences of Question and Answer Terms), achieved RU-Accuracy = 0.535. Two lightweight methods, SCO-QAT and ABSPs were proposed, for use in a state of- the-art Chinese factoid QA system (ASQA). The methods require fewer resources than heavy methods, such as the parsers and logic provers used in state-of the- art QA systems in other languages. The ABSP method is a variation of surface pattern methods. It tries to increase question coverage and maintain accuracy by targeting surface patterns for all question types, instead of specific question types, combining relations extracted by multiple surface patterns ISSN: 2231-5381 As we can see that among above described question answering systems, SAIQA- QAC2 (System for Advanced Interactive Question Answering) achieves the highest MRR (Mean Reciprocal Rank) i.e. 0.607. In this method, SAIQAQAC2 is an improvement on previous system SAIQA-Ii that achieved MRR = 0.46 for QAC1. In this improvement is done from multiple passages, and incorporating richer semantic tags. By using this strategy, ABSPs can achieve 37.33% coverage and 0.911 RU-Accuracy on the questions covered. The SCO-QAT method utilizes co-occurrence information in retrieved passages. Since it calculates all the co-occurrence combinations without extra access to the corpus or the Web, it is suitable for bandwidth-limited situations. Moreover, SCOQAT does not require word-ignoring rules to handle missing counts and it can be combined with other answer ranking features. SCO-QAT and ABSPs can be improved in several ways. In both methods, applying rules with taxonomy or ontology resources would solve most canonicalization problems. For SCO-QAT, it would be helpful if a better term weighting scheme would be used. Using more syntactic information, such as incorporating surface patterns, would result in more reliable co-occurrence calculations. For ABSPs, more accurate semantic tags, which are usually finer grained, would improve the accuracy while maintaining question coverage. Also, to increase question coverage, in addition to the strategies adopted for ABSPs, partial matching could also be used because it allows portions of a surface pattern to be unmatched. Allowing overlapping tags is also a possibility, because some errors are caused by tagging, such as wrong word segmentation. I. CONCLUSION We analysed among above described question answering systems, SAIQA- QAC2 (System for Advanced Interactive Question Answering) achieves the highest MRR (Mean Reciprocal Rank) i.e. 0.607. In this method, SAIQA-QAC2 is an improvement on previous system SAIQA-Ii that achieved MRR = 0.46 for QAC1. In this improvement is done in the answer-type determination module and the retrieval module. http://www.ijettjournal.org Page 651 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 Fig. 3. Performance of Different Question Answering Systems Structure, Sig Notes 2001-Nl-145, Information Processing Society of Japan, (Sep.), Japan, 2001. A new proximity based document retrieval module DIDF is [6] T. Mori, “Japanese Question-Answering System Using A* Search developed that performs better than other document retrieval and Its Improvement”, ACM Transactions on Asian Language modules. In Japanese Question Answering System Using A* Information Processing, Vol. 4, No. 3, Pages 280-304, 2005. Search and its Improvement, which achieved the MRR = [7] M. Murata, M. Utiyama, and H. Isahara, Question answering system using similarity guided reasoning, SIG Notes 2000-NL0.516. The reimplementation in other programming languages 135. Information Processing Society of Japan, 2000. that generate native code would be effective. Boosting [8] C. L. Clarke, G. V. Cormack, and T. R. Lynam, “Exploiting Chinese Question Answering with Two Lightweight Methods: Redundancy in Question Answering”, in Proceedings Of SIGIR: ABSPs (Alignment-based Surface Patterns) and SCO-QAT The 24th Annual International ACM SIGIR Conference On Research And Development In Information Retrieval, 358–365, (Sum of Co-occurrences of Question and Answer Terms), 2001. achieved RU-Accuracy = 0.535. Two lightweight methods, [9] J. Xu, A. Licuanan, and R. Weischedel, “TREC QA at BBN: SCO-QAT and ABSPs were proposed. For SCO-QAT, it Answering definitional questions”, in Proceedings of the twelfth would be helpful if a better term weighting scheme would be Text Retrieval Conference, 2003. [10] B. Magnini, M. Negri, and R. P. H. Tanev, “Is it the right answer? used. Using more syntactic information, such as incorporating Exploiting web redundancy for answer validation”, in Proceedings surface patterns, would result in more reliable co-occurrence of the 40th Annual Meeting of the Association for Computational calculations. Linguistics (ACL), 425–432, 2003. [11] J. Fukumoto, T. Kato, and F. Masui, Question Answering ACKNOWLEDGMENT Challenge for Five Ranked and List Answers—Overview of Ntcir4 Qac2 Subtask 1 And 2—. In Working Notes of the Fourth NTCIR I would like to articulate my thanks to Mr. Vishal Gupta, Workshop Meeting, 283–290, 2004. Assistant Professor of Computer Science and Engineering [12] J. Fukumoto, T. Kato, and F. Masui, Question answering Department in UIET, Department of Panjab University challenge (QAC-1)—Question answering evaluation at NTCIR workshop 3, in Working Notes of the Third NTCIR Workshop Chandigarh for his guidance in accomplishing this task. meeting—Part IV: Question Answering Challenge (QAC1), 1–6, 2002. REFERENCES [13] Y. Sasaki, H. Isozaki, T. Hirao, K. Kokuryou, and E. Maeda, [1] T. Kurohashi and M. Nagao, Japanese Morphological Analysis NTT’s QA systems for NTCIR QAC-1, in Working Notes of the System Juman Version 3.6 Manual, Kyoto University, Japan, Third NTCIR Workshop Meeting, Part IV: Question Answering 1998. Challenge (QAC1), 63–70, 2002. [2] S. Kurohashi, Japanese Syntactic Parsing System Knp Version 2.0 [14] H. Isozaki, “An Analysis of a High-Performance Japanese B6 Instruction Manual, Japan, 1998. Question Answering System”, ACM Transactions on Asian [3] H. Yamada, T. Kudo, and Y. Matsumoto, “Japanese Named Entity Language Information Processing, Vol. 4, No. 3, Pages 263-279, Extraction Using Support Vector Machine”, IPSJ Journal 43, 1 2005. (Jan.), 44–53, Japan, 2002. [15] H. Isozaki, and H. Kazawa, “Efficient Support Vector Classifiers [4] T. Joachims, Svmlight—Support Vector Machine. [Online]. for Named Entity Recognition”, in Proceedings of the 19th Available: Http://Svmlight.Joachims.Org/, 2002. International Conference on Computational Linguistics, 390–396, [5] K. Fujihata, M. Shiga, and T. Andmori, Extraction of Numerical 2002. Expressions by Constraints and Default Rules of Dependency ISSN: 2231-5381 http://www.ijettjournal.org Page 652 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 [16] S. Sekine, and Y. Eriguchi, “Japanese Named Entity Extraction Evaluation—Analysis of Results”, in Proceedings of the 18th International Conference on Computational Linguistics, 1106– 1110, 2000. [17] Y. Ichimura, Y. Saito, T. Sakai, T. Kokubu, and M. Koyama, “A study of the relations among question answering, Japanese named entity extraction, and named entity taxonomy (in Japanese)”, in IPSJ SIG Technical Report NL-161, 17–24, 2004. [18] S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, H. Nakaiwa, K. Ogura, Y. Ooyama, and Y. Hayashi, Goi-Taikei—A Japanese Lexicon (in Japanese), Iwanami Shoten, 1997. [19] K. S. Jones, S. Walker, and S. E. Robertson, A Probabilistic Model of Information Retrieval: Development and Comparative Experiments, Information Processing and Management 36, 779– 840, 2000. [20] C. L. A. Clarke, G. V. Cormack, and T. R. Lynam, “Exploiting Redundancy in Question Answering”, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 358–365, 2001. [21] M. Murata, M. Utiyama, and H. Isahara, Japanese QuestionAnswering System using Decreased Adding with Multiple Answers, in Working Notes of NTCIR-4, 353–360, 2004. [22] J. Suzuki, Y. Sasaki, and E. Maeda, “SVM Answer Selection for Open-Domain Question Answering”, in Proceedings of the 19th International Conference on Computational Linguistics, 974–980, 2002. [23] T. Hirao, Y. Sasaki, and H. Isozaki, “An Extrinsic Evaluation for Question-Biased Text Summarization on QA Tasks”, in Proceedings of the Workshop on Automatic Summarization, The Second Meeting of the North American Chapter of the Association for Computational Linguistics, 61–68, 2001. [24] C. W. Lee, M. Y. Day, C. L. Sung, Y. H. Lee, T. J. Jiang, C. W. Wu, D. W. Shih, Y. R. Chen, and W. L. Hsu, “Boosting Chinese Question Answering with Two Lightweight Methods: ABSPs and SCO-QAT”, ACM Transactions on Asian Language Information Processing, Vol. 7, No. 4, Article 12, 2008. [25] W. L. Hsu, S. H. Wu, and Y. S. Chen, “Event identification based on the information map-INFOMAP”, in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC’01), Tucson, AZ, 1661–1666, 2001. [26] V. N. Vapnik, The nature of statistical learning theory, Springer, 1995. [27] C. W. Wu, S. Y. Jan, R. T. H. Tsai, and W. L. Hsu, “On Using Ensemble Methods for Chinese Named Entity recognition”, in Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, 2006. [28] M. Huang, X. Zhu, Y. Hao, D. G. Payan, K. Qu, and M. Li, Discovering patterns to extract protein - protein interactions from full texts, Bioinformatics 20, 3604–3612, 2004. [29] D. Molla, and M. Gardiner, “Answerfinder Question Answering by Combining Lexical, Syntactic and Semantic Information”, in Australasian Language Technology workshop, 2005. [30] Y. Zhao, Z. M. Xu, Y. Guan, and P. Li, “Insun05QA on QA Track Of Trec’05”, in Proceedings Of The 14th Text Retrieval Conference, 2005. [31] Z. Zheng, Answer Bus question answering system, Human Language Technology Conference, 24–27, 2002. ISSN: 2231-5381 http://www.ijettjournal.org Page 653

Performance Analysis of Effective Japanese and Chinese Question Answering System

Related documents

Products

Support

Performance Analysis of Effective Japanese and Chinese Question Answering System

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib