Investigating Genetic algorithms to optimize the user query in the

advertisement
Investigating Genetic algorithms to optimize the user query in the
vector space model
Mohammad Othman Nassar
Amman Arab University
moanassar@yahoo.com
Feras Fares Al Mashagba
Amman Arab University
ferasfm79@yahoo.com
Eman Fares Al Mashagba
Irbid Private University
Emanfa71@yahoo.com
Amman- Jordan
Abstract:
This study discusses the effectiveness of using different Genetic Algorithms (GA) approaches
with different similarity measures (Cosine, DICE, Jaccard, Inner Product) in the vector space
model (VSM) based on Arabic data collection. Most of the work in this area was carried out for
English text. Very little research has been carried out on Arabic text. The nature of Arabic text is
different than that of English text, and preprocessing of Arabic text is more challenging. For each
similarity measure (Cosine, DICE, Jaccard, Inner Product) in the VSM we used and compared ten
different GA approaches based on different fitness functions, different mutations and different
crossover strategies to find the best strategy and fitness function that can be used with each
similarity measure in the VSM when the data collection is the Arabic language. Our results
indicate that the GA approach which uses one-point crossover operator, point mutation and Inner
Product similarity as a fitness function represent the best IR system in VSM.
Keywords: information retrieval, vector space model, query optimization, genetic algorithms.
Introduction:
Information retrieval (IR) can be defined as the study of how to determine and retrieve from a
corpus of stored information the portions which are responsive to particular query [1].
The vector space model, Boolean model, Fuzzy sets model and the probabilistic retrieval model
are the major information retrieval models. The retrival models are used to find the similarity
between the query and the documents in order to retrieve the documents that reflect the query.
Vector space model have four similarity measures: Cosine, DICE, Jaccard, and Inner Product.
The similarity measures are used to evaluate the effectiveness of IR system using two measures:
Precision and Recall.
A (GA) is an adaptive heuristic search algorithm premised on the evolutionary ideas of natural
selection and genetics [3]. The Genetic algorithm (GA) approach is an import approach because it
can find a global solution in many problems such as NP-hard problems, machine learning
problems, and also for evolving simple programs.
In this paper, we will investigate Cosine and Jaccard similarity measures, for each similarity
measure we compared ten different genetic algorithms settings (different mutation techniques,
different fitness functions, different crossover techniques) to optimize the user query. As a test
bed; we are going to use an Arabic data collection which composed from 242 documents and 59
queries, the correct answer for each query is also known in advanced. This collection was used by
many researches in information retrieval such as [2, 19, 20].
The difficulty of the Arabic language is due to its differences with the other Indo-European
languages. Those differences are discussed by many researchers [2, 20, 13, 14, 15], amongst them;
syntactically, morphologically, and semantically differences. Differences became clearer when
comparing Arabic to English, Arabic language is more sparsed, which means that for the same text
length, English words are repeated more often than Arabic words [14, 15]. This sparseness may
negatively affect the retrieval quality in Arabic language [2, 20]. Other differences are related to
the complexity of the Arabic roots, to the existence of many forms of writing for the same letter,
and to the punctuation associated with some letters that may change the meaning of two identical
words.
The uniqueness and the special properties for the Arabic language, its differences from the English
and the other languages, and the lack of similar studies in the literature was our motivator to
conduct a deep and rich comparative study based on Arabic data collection. We are going to use
the same data collection as in [20], using the same data collection as in [20] will allow us to
compare our results with their results.
Previous Studies:
Using GAs in information retrieval systems to optimize the user query is not a new trend in
information retrieval and it will continue, this is due to the fact that GAs are a powerful and robust
optimization techniques. Many studies have been conducted in the literature such as [2, 4, 5, 7, 6,
7, 5, 9, 10, 11, 12, 18, 20].
The authors in [8,4,6] presents many methods, all of them are based on VSM, the methods
include: the connectionist Hopfield network; the symbolic ID3/ID5R, evolution- based genetic
algorithms, symbolic ID3 Algorithm, evolution-based genetic algorithms, Simulated Annealing,
neural networks, genetic programming. They found that these techniques are promising in their
ability to analyze user queries, identify users’ information needs, and suggest alternatives for
search.
In [9, 11, 7, 5,12] the VSM have been used, the authors try to improve the IR performance by
creating different mutation probabilities, new crossover operation, new fitness functions for the
GA.
Mercy and Naomie [10] propose a framework of data fusion approach based on linear
combinations of retrieval status values obtained from Vector Space Model and Probability Model
system. They used Genetic Algorithm (GA) to find the best linear combination of weights
assigned to the scores of different retrieval system to get the most optimal retrieval performance.
Using GA to improve the performance of Arabic information system is rare in the literature. In
[17] the performance of an Arabic information retrieval system was improved using Genetic
Algorithms, based on vector space model. The performance was enhanced through the usage of an
adaptive matching function, which obtained from a weighted combination of four similarity
measures (inner product, Cosine, Jaccard and Dice).
Using GAs to improve the query in the Vector space model was studied by [20] based on Arabic
data collection, the researchers created and compared different fitness functions, different
mutations and different crossover strategies to find the best strategy and fitness function that can
be used with two similarity measures (Dice, Inner Product) in the VSM. This paper will study the
other unstudied similarity measures (Cosine, Jaccard) in the VSM and compare them to the work
of [20]. In [2] Nassar and his collegues used the GAs to improve the query in the Boolean model;
they created different Genetic Algorithm sittings to optimize the user query based on Arabic
collection.
Vector Space Model (VSM).
In the VSM the documents and queries are represented as vectors in a multidimensional space, the
dimensions for this space are the terms. Lexical scanning is required to identify the terms, after
that an optional stemming process applied to the words, then the frequency of those stems is
computed. Finally the query and the document vectors are compared using different similarity
measures (e.g. Cosine, DICE, Jaccard, Inner Product), Table 1 shows those similarity measures.
Similarity
Evaluation
Measure
Term Vector
Cosine
for
Binary
t
d q
sim (d , q )  2
d
1/ 2

q
Evaluation for Weighted Term Vector
1/ 2
sim (d j , q ) 
w
i, j
i 1
t
w
i 1
2
i. j

 wi , q
t
w
j 1
2
i ,q
t
sim (d , q)  2
Dice
d q
dq
sim (d j , q) 
2 wi , j  wi ,q
i 1
 w
t
2
i 1
i, j
 i 1 wi ,q
t
2
t
sim (d , q) 
Jaccard
d q
d  q  d q
sim (d j , q) 
w w
i, j
i 1
i ,q
t
i1 wi, j  i1 wi,q   wi, j  wi,q
t
t
2
2
i 1
Inner
Product
d i  qk
t
Sim=
 (d
k 1
ik

q
k
)
Table 1: Different Similarity Measures.
Where
wi , j , wi , q
are the weights of the
i th
term in document j , and in the query
respectively.
Genetic Algorithm (GA)
The GA algorithm flowchart is illustrated in Figure 1. Genetic algorithm operations can be used to
generate new and better generations. As shown in Figure 1 the genetic algorithm operations
include:
1) Reproduction: fittest individuals are chosen based on the fitness function.
2) Crossover: exchanging the genes between two individual chromosomes that are
reproducing. There are many crossover strategies such as n-point crossover [11], restricted
crossover [7], uniform crossover [30], fusion operator [7] and dissociated crossover [7].
For more details about the crossover strategies you can see the related references.
3) Mutation: is the process of randomly altering the genes in a particular chromosome. There
are two types of mutation:
a) Point mutation: in which a single gene is changed.
b) Chromosomal mutation:
where some number of genes is changed
c
Generate initial population
o
m
p
Evaluate each individual
l
e
t
Reproduction
e
l
Crossover
y
.
Mutation
NO
Stopping
Criteria
met?
YES
STOP
Figure 1: Flowchart for Typical Genetic Algorithm (GA).
GA's are characterized by five basic components [20] as follows:
1) Chromosome representation for the feasible solutions to the optimization problem.
2) Initial population of the feasible solutions.
3) A fitness function that evaluates each solution.
4) Genetic operators that generate a new population from the existing population.
5) Control parameters such as population size, probability of genetic operators, number of
generation.
Experiment:
In this study we used IR system based on VSM model that was built and implemented by
Hanandeh [6] to handle the 242 Arabic abstracts collected from the Proceedings of the Saudi
Arabian National Conference [16].
In this study we will follow the same procedure implemented by [20], this will allow us to
compare our results to their results. So the significant terms are extracted from relevant and
irrelevant documents then assigned weights. The binary weights of the terms are formed as a
query vector, and then the query vector is adapted as a chromosome. Finally the GA is applied to
get an optimal or near optimal query vector. After that we compared the result of the GA approach
with the result of the traditional IR system without using a GA.
The details for this study are the same as in [20] except that we used Cosine and Jaccard similarity
measures instead of DICE and Inner Product similarity measures. The steps for this study are as
the following:
1) Representation of the chromosomes.
The chromosomes are represented as following:
 Binary representation: The chromosomes use a binary representation, and are
converted to a real representation by using a random function.
 Number of Genes: We will have the same number of genes as the query and the
feedback documents that have terms with non-zero weights.
 Chromosome size: The size of the chromosomes will be equal to the number of
terms of the set (feedback documents+ the query set).
 The query vector: The query is represented as a binary.
 Terms update: Terms are modified by applying the random function on the terms
weights.
 GA approach: The GA approaches receive an initial population chromosomes
corresponding to the top 15 documents retrieved from traditional IR with respect to
that query.
2) Fitness Function.
Fitness function is a performance measure or reward function which evaluates how each solution
is good. In this study Cosine and jaccard similarity measures are used as fitness functions.
3) Selection.
Chromosomes selection depends on the fitness function where the higher values have a higher
probability to be selected in the next generation.
4) Operators.
We used two GA operators to produce offspring chromosomes, which are:
A. Crossover: it function is to mix two chromosomes together to form new offspring. In this
paper crossover occurs only with crossover probability Pc (Pc=0.8). In this study five
crossover strategies were used for VSM :
1.
One-point crossover operator.
2.
Restricted crossover operator.
3.
Uniform crossover operator.
4.
Fusion operator.
5.
Dissociated crossover.
B. Mutation is the modification of the gene values of a solution with some probability Pm. In
this experiment we used a mutation probability (Pm=0.7) and two different mutation
strategies:
1. Point mutation.
2. Chromosomal mutation.
Finally and based on the previous discussions we created ten different GA strategies. Those
strategies will be used with each similarity measure (Cosine, Jaccard), the strategies are as
following:
GA1: GA that use one-point crossover operator and point mutation.
GA2: GA that use one-point crossover operator and chromosomal mutation.
GA3: GA that use restricted crossover operator and point mutation.
GA4: GA that use restricted crossover operator and chromosomal mutation.
GA5: GA that use uniform crossover operator and point mutation.
GA6: GA that use uniform crossover operator and chromosomal mutation.
GA7: GA that use fusion operator and point mutation.
GA8: GA that use fusion operator and chromosomal mutation.
GA9: GA that use dissociated crossover and point mutation.
GA10: GA that use dissociated crossover and chromosomal mutation.
GA strategies Using Cosine Similarity:
The results for the GA strategies using cosine similarity are shown in Table 2 and Table 3. From
those tables we notice that GA1, GA2, GA4, GA5, GA8, GA9 and GA10 give a high
improvement than traditional IR system with 12.4245%, 6.959051%, 7.394054%, 5.40995%,
7.982538%, 7.255469% and 4.530111 respectively, while GA3, GA6 and GA7 give a low
improvement than traditional IR system with -1.36021%, -2.44788% and -1.26468% respectively.
This means that GA1 that use one-point crossover operator and point mutation gives the highest
improvement over the traditional approach with 12.4245%.
Recall
Cosine
GA1
GA2
GA3
GA4
GA5
GA6
GA7
GA8
GA9
GA10
0.1
0.132
0.165
0.151
0.133
0.135
0.15
0.133
0.13
0.135
0.137
0.141
0.2
0.14
0.164
0.157
0.135
0.16
0.166
0.141
0.138
0.162
0.163
0.151
0,3
0.147
0.182
0.165
0.142
0.175
0.151
0.144
0.15
0.179
0.164
0.152
0.4
0.151
0.166
0.167
0.149
0.161
0.149
0.15
0.146
0.167
0.167
0.159
0.5
0.156
0.179
0.172
0.153
0.178
0.172
0.152
0.152
0.177
0.179
0.171
0.6
0.178
0.191
0.18
0.172
0.188
0.181
0.164
0.176
0.188
0.187
0.179
0.7
0.183
0.193
0.181
0.181
0.193
0.181
0.181
0.179
0.189
0.188
0.19
0.8
0.234
0.244
0.239
0.236
0.231
0.241
0.222
0.23
0.231
0.232
0.24
0.9
0.241
0.251
0.243
0.243
0.242
0.244
0.231
0.242
0.242
0.244
0.243
Average
0.174
0.193
0.184
0.172
0.185
0.182
0.169
0.171
0.186
0.185
0.181
Table 2: Average Recall and Precision Values for 59 Query by Applying GA's on Cosine Similarity.
Recall
GA1
GA2
GA3
GA4
GA5
GA6
GA7
GA8
GA9
GA10
0.1
25.23531
14.39394
0.757576
2.272727
13.63636
0.757576
-1.51515
2.272727
3.787879
6.818182
0.2
17.14286
12.14286
-3.57143
14.28571
18.57143
0.714286
-1.42857
15.71429
16.42857
7.857143
0,3
23.80952
12.2449
-3.40136
19.04762
2.721088
-2.04082
2.040816
21.76871
11.56463
3.401361
0.4
9.933775
10.59603
-1.3245
6.622517
-1.3245
-0.66225
-3.31126
10.59603
10.59603
5.298013
0.5
14.74359
10.25641
-1.92308
14.10256
10.25641
-2.5641
-2.5641
13.46154
14.74359
9.615385
0.6
7.303371
1.123596
-3.37079
5.617978
1.685393
-7.86517
-1.1236
5.617978
5.05618
0.561798
0.7
5.464481
-1.0929
-1.0929
5.464481
-1.0929
-1.0929
-2.18579
3.278689
2.73224
3.825137
0.8
4.273504
2.136752
0.854701
-1.28205
2.991453
-5.12821
-1.7094
-1.28205
-0.8547
2.564103
0.9
4.149378
0.829876
0.829876
0.414938
1.244813
-4.14938
0.414938
0.414938
1.244813
0.829876
Average
12.4245
6.959051
-1.36021
7.394054
5.40995
-2.44788
-1.26468
7.982538
7.255469
4.530111
Table 3: GA's Improvement in Cosine Similarity (GA's Improvement %).
GA strategies Using Jaccard Similarity:
The results for the GA strategies using the Jaccard similarity are shown in Table 4 and
Table 5. From those tables we notice that GA1, GA2, GA4, GA5, GA8, GA9 and
GA10 give a high improvement than traditional IR system with 3.790779%,
12.47687%, 7.651593%, 8.639001%, 7.81104%, 7.913123% and 9.920652%
respectively while GA3, GA6 and GA7 give a low improvement than traditional IR
system with -1.20423%, -1.72545% and -3.85975% respectively, this means that GA2
that use GA that use one-point crossover operator and chromosomal mutation gives
the highest improvement over the traditional approach with 12.47687%.
Recall
Jaccard
GA1
GA2
GA3
GA4
GA5
GA6
GA7
GA8
GA9
GA10
0.1
0.13
0.134
0.141
0.133
0.137
0.141
0.129
0.122
0.142
0.139
0.141
0.2
0.17
0.176
0.199
0.165
0.182
0.184
0.165
0.162
0.182
0.185
0.191
0,3
0.261
0.271
0.288
0.243
0.277
0.281
0.256
0.254
0.271
0.274
0.28
0.4
0.213
0.222
0.277
0.211
0.266
0.269
0.214
0.211
0.271
0.277
0.278
0.5
0.355
0.377
0.387
0.342
0.373
0.375
0.345
0.333
0.373
0.377
0.385
0.6
0.335
0.343
0.401
0.341
0.38
0.384
0.323
0.311
0.382
0.381
0.386
0.7
0.385
0.399
0.398
0.381
0.385
0.387
0.371
0.362
0.382
0.359
0.381
0.8
0.389
0.401
0.415
0.392
0.406
0.41
0.385
0.389
0.407
0.411
0.414
0.9
0.434
0.452
0.467
0.433
0.445
0.438
0.437
0.43
0.434
0.441
0.441
Average
0.296889
0.308333
0.330333
0.293444
0.316778
0.318778
0.291667
0.286
0.316
0.316
0.321889
Table 4: Average Recall and Precision Values for 59 Query by Applying GA's on Jaccard Similarity.
Recall
GA1
GA2
GA3
GA4
GA5
GA6
GA7
GA8
GA9
GA10
0.1
3.076923 8.461538 2.307692 5.384615 8.461538
-0.76923
-6.15385 9.230769 6.923077 8.461538
0.2
3.529412 17.05882
-2.94118
7.058824 8.235294
-2.94118
-4.70588 7.058824 8.823529 12.35294
0,3
3.831418 10.34483
-6.89655
6.130268 7.662835
-1.91571
-2.68199 3.831418 4.980843 7.279693
0.4
4.225352 30.04695
-0.93897
24.88263 26.29108 0.469484 -0.93897 27.23005 30.04695 30.51643
0.5
6.197183 9.014085
-3.66197
5.070423 5.633803
-2.8169
-6.19718 5.070423 6.197183 8.450704
0.6
2.38806
19.70149 1.791045 13.43284 14.62687
-3.58209
-7.16418 14.02985 13.73134 15.22388
0.7
3.636364 3.376623
0.8
-1.03896
0
0.519481
-3.63636
-5.97403
3.084833 6.683805 0.771208
4.37018
5.398458
-1.02828
0
0.9
4.147465 7.603687
-0.23041
2.534562 0.921659 0.691244 -0.92166
Average
3.790779 12.47687
-1.20423
7.651593 8.639001
-1.72545
Table 5: GA's Improvement in Jaccard Similarity (GA's Improvement %)
-3.85975
-0.77922
-6.75325
-1.03896
4.627249 5.655527 6.426735
0
1.612903 1.612903
7.81104
7.913123 9.920652
Comparison between the Best GA's Strategies:
To create a detailed and useful comparison we will bring the results for the DICE and
Inner Product from [20], and put them with our results for the Jaccard and Cosine. Table
6, and Figure 2 show the comparison between Cosine (GA1), Jaccard(GA2), Dice(GA9)
and Inner Product (GA1). It is clear that we used only the best GA strategy for each
similarity measure (Cosine, DICE, Jaccard, Inner Product) in the VSM. From this table
we notice that the Inner Product(GA1) represent the best strategy over Cosine(GA1),
Jaccard(GA2) and Dice(GA9). Which means that Inner Product(GA1) that use one-point
crossover operator, point mutation, and Inner Product similarity as a fitness function
represent the best IR system in VSM to be used with the Arabic data collection. Figure 2
also present the data in the table 6.
Inner
Recall
Cosine(GA1)
Jaccard(GA2)
Dice(GA9)
0.1
0.165
0.141
0.141
0.146
0.2
0.164
0.199
0.197
0.208
0,3
0.182
0.288
0.298
0.301
0.4
0.166
0.277
0.277
0.283
0.5
0.179
0.387
0.402
0.405
0.6
0.191
0.401
0.408
0.409
0.7
0.193
0.398
0.396
0.413
0.8
0.244
0.415
0.412
0.437
0.9
0.251
0.467
0.441
0.487
Average
0.193
0.330333
0.330222
0.343222
Product(GA1)
Table 6: Comparison Between the Best GA Strategies (Each Similarity Measures).
0.6
0.5
0.4
Cosine(GA1)
Inner(GA1)
Jaccard(GA2)
0.3
Dice(GA9)
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Figure 2: Comparison Between the Best GA Strategies (Each Similarity Measures).
Conclusions:
For each similarity measure (Cosine, DICE, Jaccard, Inner Product) in the VSM we
compared ten different GA approaches, and by calculating the improvement of each
approach over the traditional IR system, we noticed that most approaches (GA1, GA2,
GA4, GA5, GA8, GA9 and GA10) gave improvements compared to the traditional IR
system, also we noticed that the GA approach which uses one-point crossover operator,
point mutation, and Inner Product similarity as a fitness function represent the best IR
system in VSM to be used with the Arabic data collections with improvements over the
traditional approach ranged from 5.626% to 28.0543%.
References:
[1] Tengku M.T., Sembok, C.J., and van Rijsbergen, "A simple logical-linguistic
document retrieval system", Information Processing & Management, Volume 26, Issue 1,
pp. 111-134, 1990.
[2] Mohammad Othman Nassar, Feras Al Mashagba, and Eman Al Mashagba, "
Improving the User Query for the Boolean Model Using Genetic Algorithms,"
International Journal of Computer Science Issues (IJCSI), ISSN (online): 1694-0814,
Volume 8, Issue 5, September 2011.
[3] Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning,
Addison-Wesley, 1989.
[4] Hsinchun C., "Machine Learning for Information Retrieval: Neural Networks,
Symbolic Learning, and Genetic Algorithms", Journal of the American Society for
Information Science. Volume 46 Issue 3, April 1995.
[5] D. Vrajitoru, “Crossover improvement for the genetic algorithm in information
retrieval”, Information Processing& Management, 34(4), pp. 405–415, 1998.
[6] Hsinchun C, Ganesan S, Linlin S, "A Machine Learning Approach to Inductive Query
by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and
Simulated Annealing", Journal Of The American Society For Information Science.
49(8):693–705, 1998.
[7] Vicente P., Cristina P., "Order-Based Fitness Functions for Genetic Algorithms
Applied to Relevance Feedback", Journal Of The American Society For Information
Science And Technology, 54(2):152–160, 2003.
[8] Andrew T., "an Artificial Intelligence Approach to Information Retrieval",
Information Processing and Management, 40(4):619-632, 2004.
[9] Rocio C., Carlos Lorenzetti, Ana M., Nelida B., "Genetic Algorithms for Topical
Web Search: A Study of Different Mutation Rates", ACM Trans. Inter. Tech., 4(4):378–
419, 2005.
[10] Mercy T., Naomie S., "A Framework for Genetic-Based Fusion of Similarity
Measures In Chemical Compound Retrieval", International Symposium on Bio-Inspired
Computing, Puteri Pan Pacific Hotel Johor Bahru, 5 - 7 September 2005.
[11] Ahmed A. A. Radwan, Bahgat A. Abdel Latef, Abdel Mgeid A. Ali, Osman A.
Sadek, "Using Genetic Algorithm to Improve Information Retrieval Systems",
proceedings of world academy of since, engineering and technology, volume 17, ISSN
1307-6884, 2006.
[12] Abdelmgeid A., "Applying Genetic Algorithm in Query Improvement Problem",
International Journal "Information Technologies and Knowledge, Vol.1, p 309-316.
2007.
[13] Khoja, S., "APT:Arabic part-of-speech tagger", proceedings of the student workshop
at second meeting of north American chapter of Association for Copmputational
Linguistics (NAACL2001), Pittsburgh, Pennsylvania, pp. 20-26, 2001.
[14] yahaya, A., "on the Complexity of the initial stage of Arabic text processing", First
Great Lakes Computer Science Conference, Kalamazoo, Michigan, USA, October, 1989.
[15] Goweder, A., De Roeck, A., "Assessment of a Significant Arabic Corpus", Arabic
Natural Language Processing Workshop (ACL2001), Toulouse, France. Downloaded
from: (http://www.elsnet.org/acl2001 arabic.html).
[16] I. Hmedi, and G. Kanaan and M. Evens, "design and implementation of automatic
indexing for information retrieval with Arabic documents", Journal of American society
for information science, Volume 48 Issue 10, pp. 867-881, 1997.
[17] Bassam Al-Shargabi, Islam Amro, and Ghassan Kanaan, "Exploit Genetic Algorithm
to Enhance Arabic Information Retrieval", 3rd International Conference on Arabic
Language Processing (CITALA’09), Rabat, Morocco, pp. 37-41, 2009.
[18] Fatemeh Dashti, and Solmaz Abdollahi Zad," Optimizing the data search results in
web using Genetic Algorithm", international journal of advanced engineering and
technologies, Vol 1, Issue No. 1, 016 – 022, ISSN: 2230-781, 2010.
[19] Mohammad Othman Nassar, Ghassan Kanaan, and Hussain A. H. Awad,
“Comparison between different global weighting schemes,” Lecture Notes in
Engineering and Computer Science journal, ISSN: 2078-0966 (online version); 20780958 (print version), Volume: 2180; Issue: 1; pp: 690-692 ; Date: 2010; published by
Newswood Limited.
[20] Eman Al Mashagba, Feras Al Mashagba, and Mohammad Othman Nassar," Query
Optimization Using Genetic Algorithms in the Vector Space Model," International
Journal of Computer Science Issues (IJCSI), ISSN (online): 1694-0814, Volume 8, Issue
5, September 2011.
Download