Uploaded by abhishekupamanyu

Neural Network Model for semantic analysis of Sanskrit

advertisement
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018 • ISSN: 1947-928X • eISSN: 1947-9298
An official publication of the Information Resources Management Association
EDITORS-IN-CHIEF
Vikrant Bhateja, Shri Ramswaroop Memorial Group of Professional Colleges, India
Nilanjan Dey, Techno India College of Technology, India
INTERNATIONAL ADVISORY BOARD
Amira S. Ashour, Tanta University, Saudi Arabia
Valentina Balas, Aurel Vlaicu University of Arad, Romania
Suresh Chandra Satapathy, Prasad V. Potluri Siddhartha Institute of Technology, India
ASSOCIATE EDITORS
Martyn Amos, Manchester Metropolitan University, UK
Wolfgang Banzhaf, Memorial University of Newfoundland, Canada
Peter Bentley, University College London, UK
Marta Cimitile, Unitelma Sapienza, Rome, Italy
Carlos Artemio Coello, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico
Ernesto Costa, University of Coimbra, Portugal
Fabrício Olivetti de França, Federal University of ABC, Brazil
Pedr P.B. de Oliveira, Universidade Presbiteriana Mackenzie, Brazil
Mika Hirvensalo, University of Turku, Finland
Luis Mateus Rocha, Indiana University, USA
Jon Timmis, University of York, UK
Fernando Jose Von Zuben, State University of Campinas, Brazil
EDITORIAL REVIEW BOARD
Arturo Hernandez Aguirre, Centre for Research in Mathematics, Mexico
Gina Maira Barbosa de Oliveira, Federal University of Uberlândia, Brazil
Mohamed Amine Boudia, GeCode Laboratory, Department of Computer Science Tahar Moulay University of Saida, Algeria
Guilherme Coelho, University of Campinas, Brazil
Luis Correia, University of Lisbon, Portugal
Carlos Cotta, Universidad de Málaga, Spain
Nareli Cruz-Cortès, Center for Computing Research of the National Polytechnic Institute, Mexico
Lauro Cássio Martins de Paula, Federal University of Goiás, Brazil
Renato Dourado Maia, State University of Montes Claros, Brazil
Rusins Freivalds, University of Latvia, Latvia
Pierluigi Frisco, Heriot-Watt University, UK
Henryk Fukś, Brock University, Canada
Pauline Haddow, Norwegian University of Science and Technology, Norway
Emma Hart, Edinburgh Napier University, UK
Andy Hone, University of Kent at Canterbury, UK
Greg Hornby, University of California, USA
Ahmad Rezaee Jordehi, University Putra Malaysia, Malaysia
Sanjeev Kumar, Perot Systems Corporation, USA
André Leier, University of Alabama at Birmingham, USA
Angelo C. Loula, State University of Feira de Santana, Brazil
Efrén Mezura-Montes, Laboratorio Nacional de Informática Avanzada, Mexico
Julian Miller, University of York, UK
Melanie Mitchell, Portland State University, USA
Beatrice Palano, University of Milan, Italy
Prantosh Kumar Kumar Paul, Raiganj University, India
Nelisha Pillay, University of KwaZulu-Natal, South Africa
Editorial Review Board
Continued
Alfonso Rodriquez-Paton, Campus de Montegancedo, Spain
Juan Carlos Seck Tuoh Mora, Autonomous University of Hidalgo, Mexico
Michael L. Simpson, University of Tennessee, USA
Pietro Speroni di Fenizio, University of Coimbra, Portugal
Gunnar Tufte, Norwegian University of Science and Technology, Norway
Call for Articles
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018 • ISSN: 1947-928X • eISSN: 1947-9298
An official publication of the Information Resources Management Association
Mission
The mission of the International Journal of Natural Computing Research (IJNCR) is to serve as a worldleading forum for the publication of scientific and technological papers involving all main areas of natural
computing, namely, nature-inspired computing, methods to computationally synthesize natural phenomena,
and novel computing paradigms based on natural materials. This journal publishes original material, theoretical
and experimental applications, and review papers on the process of extracting ideas from nature to develop
computational systems or materials to perform computation.
Coverage and major topics
The topics of interest in this journal include, but are not limited to:
Artificial chemistry • Artificial Immune Systems • Artificial life • Artificial Neural Networks • Cellular Automata
• Computational Image and Video Processing • Computational Intelligence • Fractal geometry • Genetic and
evolutionary algorithms • Growth and developmental algorithms • Intelligent Signal Processing • Molecular
computing • Nature-inspired Data Engineering • Quantum computing • Smart Computing Informatics • Swarm
Intelligence
All inquiries regarding IJNCR should be directed to the attention of:
Nilanjan Dey, Editor-in-Chief • IJNCR@igi-global.com
All manuscript submissions to IJNCR should be sent through the online submission system:
http://www.igi-global.com/authorseditors/titlesubmission/newproject.aspx
Ideas for Special Theme Issues may be submitted to the Editor(s)-in-Chief
Please recommend this publication to your librarian
For a convenient easy-to-use library recommendation form, please visit:
http://www.igi-global.com/IJNCR
Table of Contents
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March-2018 • ISSN: 1947-928X • eISSN: 1947-9298
An official publication of the Information Resources Management Association
Research Articles
1
Neural Network Model for Semantic Analysis of Sanskrit Text
;
Smita Selot, Department of Computer Applications, SSTC, Bhiali, India
Neeta Tripathi, SSTC, Bhiali, India
A. S. Zadgaonkar, CV Raman University, Bilaspur, India
15
Client-Awareness Resource Allotment and Job Scheduling in Heterogeneous Cloud by Using Social
Group Optimization
;
Phani Praveen S., Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore, India
K. Thirupathi Rao, Computer Science and Engineering, K L Deemed to be University, Guntur, India
32
VRoptBees: A Bee-Inspired Framework for Solving Vehicle Routing Problems
;
Thiago A.S. Masutti, Mackenzie Presbyterian University, Sao Paulo, Brazil
Leandro Nunes de Castro, Mackenzie Presbyterian University, Sao Paulo, Brazil
57
Analysis of Feature Selection and Ensemble Classifier Methods for Intrusion Detection
;
H.P. Vinutha, Bapuji Institute of Engineering and Technology, Davangere, Karnataka, India
Poornima Basavaraju, Bapuji Institute of Engineering and Technology, Davangere, Karnataka, India
Copyright
The International Journal of Natural Computing Research (IJNCR) (ISSN 1947-928X; eISSN 1947-9298), Copyright © 2018 IGI Global. All rights,
including translation into other languages reserved by the publisher. No part of this journal may be reproduced or used in any form or by any means without
written permission from the publisher, except for noncommercial, educational use including classroom teaching purposes. Product or company names
used in this journal are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI
Global of the trademark or registered trademark. The views expressed in this journal are those of the authors but not necessarily of IGI Global.
The International Journal of Natural Computing Research is indexed or listed in the following: ACM Digital Library;
Bacon’s Media Directory; Cabell’s Directories; DBLP; Google Scholar; INSPEC; JournalTOCs; MediaFinder; The
Standard Periodical Directory; Ulrich’s Periodicals Directory
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Neural Network Model for Semantic
Analysis of Sanskrit Text
Smita Selot, Department of Computer Applications, SSTC, Bhiali, India
Neeta Tripathi, SSTC, Bhiali, India
A. S. Zadgaonkar, CV Raman University, Bilaspur, India
ABSTRACT
Semantic analysis is the process of extracting meaning of the sentence, from a given language. From the
perspective of computer processing, challenge lies in making computer understand the meaning of the
given sentence. Understandability depends upon the grammar, syntactic and semantic representation of
the language and methods employed for extracting these parameters. Semantics interpretation methods
of natural language varies from language to language, as grammatical structure and morphological
representation of one language may be different from another. One ancient Indian language, Sanskrit,
has its own unique way of embedding syntactic information within words of relevance in a sentence.
Sanskrit grammar is defined in 4000 rules by PaninI reveals the mechanism of adding suffixes to
words according to its use in sentence. Through this article, a method of extracting meaningful
information through suffixes and classifying the word into a defined semantic category is presented.
The application of NN-based classification has improved the processing of text.
Keywords
Back Propagation Algorithm, Natural Language Processing, Neural Network, pAninI Grammar Framework,
Framework, Semantic Analysis
INTRODUCTION
Semantic, in basic form is the meaning of the sentence. In computer driven world of automation, it
has become necessary for machine to understand the meaning of the given text for applications like
automatic answer evaluation, summary generation, translation system etc. In linguistics, semantic
analysis is the process of relating syntactic structures, from words and phrases of a sentence to their
language independent meaning. Given a sentence, one way to perform semantic analysis is to identify
the relation of the words with action entity of the sentence. For example, Rohit ate ice cream, agent
of action is Rohit, object on which action is performed is ice cream. This type of association creates
predicate-arguments relation between the verb and its constituent. This association is achieved in
Sanskrit language through kArakA analysis. Understanding of the language is guided by its semantic
interpretation. Semantic analysis in Sanskrit language is guided by six basic semantic roles given by
pAninI as kAraka values.
DOI: 10.4018/IJNCR.2018010101

Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

1
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
pAninI, an ancient Sanskrit grammarian, mentioned nearly 4000 rules called sutra in book
called asthadhyAyi; meaning eight chapters. These rules describe transformational grammar, which
transforms root word to number of dictionary words by adding proper suffix, prefix or both, to the
root word. Suffix to be added depends on the category, gender, number of the word. Structured
tables containing suffix are maintained for the purpose. These declension tables are designed in such
a way that their position in the table are defined with respect to number, gender and karka value.
Similar ending words follow the same declension, for example rAma is a-ending root word and words
generated using a-ending declension table are rAmH, rAmau rAmAH by appending H, au and AH to
rAma, respectively. Suffix based information of the word reveals not only syntactic but drives a way
to find semantic based relation of words with verb using kAraka theory.
Next section describes Sanskrit language and kAraka theory, section three states the problem
definition, followed by NN model for semantic analysis. Features extracted from corpus of preannotated text are supplied as input to system with objective of making system learn six kAraka
defined by pAninI. This paper presents the concept of Neural Network, work done in the field of NN
and Natural Language Processing, algorithm, annotated corpus and results obtained.
SANSKRIT AND kAraka THEORY
Sanskrit language, with well-defined grammatical and morphological structure, not only presents
relation of suffix-affix with the word, but also provides syntactic and semantic information the of
words in a sentence. Due to its rich inflectional morphological structure; it is predicted to be suitable
for computer processing. Work at NASA on Sanskrit language reported that triplets (role of the word,
word, action) generated from this language are equivalent to semantic net representation (Briggs 1995).
Sanskrit grammar is developed by three main individuals over the years – pAninI, Katyanan and
Patanjali. pAninI was first one to identify nearly 4000 rules to define the language (Kak, 1987). Sanskrit
is one of the 22 official languages of India and standardized dialect of old Indo-Aryan community. It
is a rule-based language having flexibility and precision, both. It is an order free language – changing
the position of the word, does not change the meaning of the sentence. Processing techniques for order
free languages cannot be derived from rigid order language like English. pAninI’s idea of semantic
interpretation was based on identification of relations of words with action in a sentence. as agent,
object etc. This was called as kAraka theory. In Sanskrit, this relationship is developed by adding
specific, pre-set syllables, known as case-endings or vibhakti to the basic noun form.Natural language
processing concept in pAnini way describes the processing of language
Sanskrit is highly inflectional language, where words are generated from the root word. Structure
of the word is given as <word>=<prefix><root_word><suffix>. In pAninI system finite set of rules
are enough to generate infinite sentences (Kak,1987). Extraction of prefix and suffix by morphological
analyzer accurately, help in syntactic and semantic analysis of the sentence. Case endings are the
driving force for semantic interpretation as it can change the meaning of the sentence as stated in
the example below:
1. RamH bAlakam grAmAt yaati. (Rama brings the boy from the village.)
2. RamH bAlakam grAmam yaati. (Rama takes the boy to the village.)
This paper focuses on kAraka based semantic analysis of text under six kAraka as described
in Table 1. Identification of kAraka values of words forms the basis of fundamental semantic
interpretation (Bharati et. al 2008; Dwivedi 2002). Ancient old kAraka theory rules for external sandhi
are modeled with finite state machine (Hyman 2009) and potential of pAninI grammar exceeds power
of finite state machines. (Vaidya et. al. 2005). kAraka roles are equivalent to case-based semantics
2
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Table 1. Six kAraka in pAnanian model
S.No
Case
kAraka
Vibhakti
Meaning
1
Nominative
Karta
prathamA
Agent
2
Accusavive
Karma
dviteeyA
Object
3
Instrumental
karNa
Trieetiya
Instrument
4
Dative
Sampradan
Chaaurthi
Recipient
5
Ablative
apAdAn
panchami
Departure
6
Locative
adhikaraN
saptami
Place
in event-driven situations, where each word or phrase is related to action. This relation is labeled
as cases like agent, object, location (Kak 1987; Hellwig 2007). Under pAnini sytem of kAraka they
are identified as karta, karma, karan, sampradan, apadana and adhikaran which are interpreted
as agent, object, instrument, cause, source of departure and location respectively as presented in
Table 1 (Kiparsky 2002) various algorithms over the years have evolved for semantic classification
of words, text and documents. Rule based system are also one of the conventional and good means
of extracting information. Order of application of rules from NLP perspective is optimized and
implemented on subantaprakaraṇa of Siddhāntakaumudī. This system works well as Sanskrit is
rules-based language. There is no order of rules defined in theory; but for practical implementation
an optimized order is defined by the authors. (Patel & Katuri, 2015) One of the major problems is
designing and incorporating large volume of rules and to handle the overlapping or conflicting cases
(Selot et al., 2010). Development of knowledge representation for Sanskrit based on stated theory is
also experimented. (Selot et al., 2007) Dependency amongst the constituent of the words in a sentence
is also a means of representing semantic relation. (Pedersen et al., 2004)
A stochastic Part of Speech (POS) tagger for unprocessed Sanskrit text using Hidden Markov
model was designed and analyzed. Parameters for the system were obtained from annotated corpus
of about 1,500,000 words. System analyzed unannotated Sanskrit text by repeated application of
stochastic model (Hellwig, 2007). Mathematical models such as Hidden Markov, Maximum Entropy
and Support Vector machines are extensively used in natural language processing (Timothy, 2005).
With evolution of statistical methods, N-gram models became popular in POS tagging. Trigram’s and
Tags (TnT) for POS tagging has proved to be an efficient tool for Persian language. (Tasharofi, 2008).
Microsoft Research India (MSRI) introduced hierarchical Indic language POS tagset (IL-POSTS) for
tagging Indian languages. An experiment on the use of the IL-POSTS for annotating Sanskrit corpus
is conducted. (Jha et.al, 2010) Though statistical model are quite efficient and popular for language
processing; neural network is equally efficient with reduced processing time in applications like text
categorization (Babu, 2010).
With availability of natural language processing tools like nlptk; an open source platform;
computations and analysis in the domain has taken a step forward in research areas. However, for
Indian languages especially Sanskrit; availability of large volume of corpus and processing is still a
challenge (Joon et al., 2017)
Use of Neural Network for complex task like NLP is rarely found as most of the work done using
stochastic model reports an accuracy of more than 90%. Stochastic model requires large amount of
pre-annotated corpus to calculate various probability values (Shan 2004). Lack of availability of such
large size corpus for regional language hinders the development of NLP systems with high accuracy
rate. Neural Network appears to be better choice for classifying the Sanskrit words in semantic classes
as it has paradigm-based representation and had reported better result under sparse data.
3
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
PROBLEM DEFINITION
Processing of Sanskrit text is challenging one for two main reasons; firstly, it is order free and secondly
words coagulate at junction to form a new word. Output oriented training of any language requires
large tagged corpus for better results. Feature extraction from individual words is possible with such
tagged data. Scarcity of tagged data in Sanskrit language restricts the number of algorithm that can
be implemented for semantic extraction Hence, main objective is to identify the semantic relation
of the word with action entity of the sentence under stated constraint environment. Given a word
with its suffix, system identifies the kAraka value of the word from the set of predefined six kAraka
roles. System is first trained for semantic classes using manually pre-annotated corpus. Features are
extracted from corpus; set of input/output are stored and used for training. System’s performance is then
evaluated on new set tagged sentence. Manually annotated text was developed for experiment purpose.
Challenge was mainly associated with conflicting cases. When a suffix maps to multiple semantic
classes; means to resolve them was needed. Proposed system handled such conflict by maintaining
context-based features as most of the conflicts are resolved by occurrence of a particular word or its
constituents in context window. Neural Network is used to train the system as it works well in sparse
data. Effort in development of this algorithm has an objective of accomplishing learning of semantic
relation through a technical approach and analysing its inbuilt scientific way of its representation.
Finally, performance of the system is discussed is reported.
FEATURE SELECTION
Annotated corpus describing syntactic and kAraka categories of words is manually generated. It
includes the suffix-affix information, category, vibhakti in nouns and tense and mood in verbs. Set of
features generally used to train the system are morphological feature, corpus-based feature, contextbased feature and lexicon-based feature.
Morphological Features
It includes the suffix and prefixes of words in the context. Prefix or suffix is a sequence of first or
last few characters of a word, which does not possess linguistically meaning. Use of prefix and suffix
information works well for highly inflected languages.
Corpus Based Features
Corpus based features play a crucial role in tagging of words. Each word is to be tagged with one
of the six pAninI class. Not all the words will fall under six tags; some words exhibit relation with
subsequent words, they are categorized as relation class. For words which are neither in six tags nor
in relation tag are assigned category other. To finalize a tag for a word, its historical information in
form of probability is calculated from corpus. A probability vector is associated with each suffix
which gives the chances of assigning a particular kAraka to a word with a defined suffix.
Context Based Features
In declension tables; in some cases, particular suffix occupies more than one place, resulting in
ambiguity. To resolve the kAraka or vibhakti ambiguity associated with the word, values of proceeding
and successive words are taken into consideration. It is quite arguable that for order free languages,
like Sanskrit and Hindi, context-based feature should not be considered. But, it has been found that
occurrence of a particular word before or after the word generates important clue in resolving the
kAraka or vibhakti ambiguity. For example, in the word vanam, am suffix occupies two different
vibhakti position- first and second. As observed in the grammar, occurrence of words like dhik,hA,
ubhyatH, pAritH confirms the vibhakti value of am as dviteeyA. (Rao 2005). Context window of
various sizes are used in problems like word sense disambiguation (WSD), POS tagging, semantic
4
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
tagging etc. Window size of three gives better result maintaining the balance between accuracy and
processing overhead. Hence context window of size three (wi+1, wi, wi-1) is taken for resolving the
vibhakti ambiguity associated with Sanskrit language.
Lexicon Based Feature
Instead of maintaining a complete dictionary of words, list of verbal and nominal bases has been
maintained. There is strong association of words with the verb occurring in the sentence. For example,
in deciding vibhakti value for am suffix, occurrence of any one of sixteen-verbal base confirms the
vibhakti value of am as second. These sixteen verbs are {dhuH, yAch, pach, chi bru, Shas anD, rudh,
pRich, ji, math, mush, ni, kriSh, hRi }(Dwivedi 2002). Likewise, a complete database of nominal base
is also maintained. Features are selected from the annotated corpus and are assigned numeric value
after mapping them to numeric data. A sample tagged data used for training is shown in the Figure
1; where each word is tagged with its suffix, syntactic category and kAraka value. For example; first
word in the Figure 1; swAmI; contains I suffix, part-of-speech is noun and kAraka value is 1 prathamA
Set of features selected are quantified and used for training network. Text coverage of suffix
plays an important role in resolving the conflict.
NN MODEL FOR SEMANTIC CLASSIFICATION
Popular methods for NLP are rule based or stochastic models, where former emphasizes on rules of
language and later uses large corpus for processing. Major task in NLP at syntactic and semantic level
is classification of words or phrases. Neural network is popular tool for classification task which uses
an extensive data driven approach for solving complex problem Capabilities of NN are exploited in
classification task related to natural language, such as text categorization, POS tagging etc. It has
been shown that NN based text categorization is better than N-gram approach giving 98% accuracy
with speed five times faster than N-gram (Kumar 2010). A multi layered neural network with the
back propagation is used for solving the problem. Input to the system is syntactic and morphological
features of the previous and successive words in a sentence. Brief working of Multi layered perceptron
(MLP) based NN is explained in the Figure 2.
MLP Network
It consists of minimum of three layers input, hidden and output layer represented by symbol i, h and
o as presented in Figure 2. Each layer consists of processing units termed as neurons. Fully connected
network uses sigmoid activation function which limit the output from hidden unit between -1 and
1. MLP projects the data to be classified into different spaces, where neuron of output participates
Figure 1. Sample tagged data
5
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 2. Three-layer MLP network
in the separation of classes. Final outcome is obtained by applying purelin function as output is in
range 1 to 6 depending upon the kAraka value.
Back Propagation Learning Algorithm
Learning algorithms are commonly used to make computer system learn a concept for particular
applications. Neural Network is one of the learning environments simulating human brain. It consists
of three layers namely input, hidden and output layer. Network is trained with set of defined inputoutput data set. Performance of the system is tested by comparing actual output with desired one.
Back propagation algorithm calculates error and sends it back to neurons for weight adjustment.
After each pass error is reduced and system is tuned for better results. System works in two phases,
in forward phase weights are multiplied to respective inputs and bias is added to it. Result is fed as
input to hidden layer; after squashing the values. Data obtained from hidden is fed to output layer;
to limit the range sigmoid function is applied. Output obtained is compared with actual output and
error is measured as difference of the two. In the backward phase, error is transmitted back; and
weights are modified as function of error. In the forward phase, weights of the directed links remain
unchanged at each processing unit of the hidden layer. For n input values to the network; weighted
sum is obtained and sigmod function is applied to this weighted sum as given in Equations (1) and (2):
n
u j =∑w ijh pj + b
(1)
h j = 1
(2)
1
1 −e
uj
where h j is the output from the jth hidden unit. At each processing unit j of the output layer; weighted
outputs from the all neuron of hidden layer are added using Equation 3 and resulting value is passed
through a sigmoid function as given in Equation 4:
6
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
n
v j =∑wijy h j +b
(3)
y j = 1
(4)
1
1 −e
vj
where y j is the final output. If t0 is the desired output and y 0 is the actual output, in the backward
pass, for every output neuron, local gradient is calculated as per Equations 5:
go =Eo yo (1 − yo )
(5)
where Eo = to − yo
(6)
likewise, for each hidden neuron, local gradient is calculated as:
gh =Eh hi (1 − hi )
(7)
n
where Eh =∑go wijy (8)
1
To speed up the training process, momentum term α and learning rate η , is added to the weight
update formulae in hidden and output layers as given in Equations 9 and 10:
∆wijh (new ) = ηgh hi + α∆wijh (old )

wijh (new ) = wijh (old ) + ∆wijh (new ) 

(9)
∆wijy (new ) = ηgO yi + α∆wijy (old )

wijy (new ) = wijy (old ) + ∆wijy (new ) 

(10)
Outline of Algorithm
Step 1: Weights are initialized to small random variables
Step 2: For each input-output training pair
Input vector=<p1, p2… pn>
Weight for first neuron = <w11, w21, w31… wn1>
Output=<y0>
Step 3: Transmit input values to neurons of hidden layer.
u j =
n
∑w pj + b
h
ij
1
Step 4: Apply activation function
7
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
hj =
1
u
1− e j
Step 5: Each output unit receives sum of weighted input and applies an activation function to it
n
vj =
∑w h
y
ij
j
+ b
1
yj = 1
1− e
vj
Step 6: Each output unit, difference between target and desired value is calculated as an error.
E rr = to − y o
Step 7: At each hidden unit the error is calculated as error from the layer above it.
Step 8: Weights and biases are updated at input and hidden layer
Step 9: Stopping condition is tested
Step 10: Repeat steps 3 to 9 till stopping condition is achieved
Processing of text using NN requires a defined set of input data in numerical form. Numerically
encoded data is generated by capturing various features of the input. For language processing
applications, these features include – morphological, corpus-based, context based, lexicon-based
features. Next section explains the generation of encoded data.
Encoding of the Input Data
Features vector, input to the NN, is generated by converting different features to their numeric form
as described in the Table 2.
Algorithm for encoding input values reads the pre-annotated text and converts the data into
feature vector.
System is trained using multilayer neural networks with squashing functions in the hidden layer.
These functions compress an infinite input range of input values to a finite output range. In sigmoid
functions, slope approaches zero as input gets larger. This causes problem in steepest descent, as
gradient has small value, changes in weight and biases are very small, though they are far from their
optimum values. Resilient back propagation removes these problems of magnitude of partial derivative.
Table 2. Sample set of encoded values
Class\number
8
1
2
3
4
5
6
category
Pronoun
Noun
Verb
Avaya
Adverb
Prefix(verb)
Up
Anu
Adi
A
Abhini
Verb(root)
Gachch
Pach
Kar
Bhu
Anay
as
Vibhakti
Agent
Object
instrument
cause
From
Relation
Suffix(verb)
ti
si
At
ami
Anti
AmH
Suffix(noun)
No suffix
Am
H
Aya
ebhyH
Asya
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
It uses sign of the derivative to determine the direction of weight update. Amount of weight change
is determined by the separate update value and is incremented by an amount delta_inc whenever the
derivative change sign from previous iteration. Weight is unchanged if derivative is zero; change
is decreased for oscillating weights and increased if weight continues to change in same direction.
In NN based semantic classification, resilient back propagation algorithm is implemented.
Input Data for the BPN
Feature vector, main input in network, is generated by extracting parameters from annotated corpus.
These parameters are classified under different category as given in the Table 3. Suffixes and prefixes
fall under morphological features; are crucial part of feature vector Apart from this, context based
and syntactic class also form a part of the vector. Context based features are significant as they help
in resolving the conflict. Occurrence of multiple matched suffixes in a particular vibhakti group is
governed by occurrence of special words at preceding and successive positions.
Data set of pre-annotated sentences is created with tagged words revealing the syntactic feature,
morphological and semantic features of the words such as suffix, category and vibhakti value.
Annotated corpora contain sentences taken from various Sanskrit text books. An example of dataset
and its generated encoded form is presented in Figure 3. Algorithm for generating the encoded data
after reading each sentence of a pre-annotated data is presented below:
Step 1: Identify the word of each sentence
Step 2: Read the features tagged with word
Step 3: Identify the type of feature
Step 4: Convert them into the numeric value as described in Table 2.
Step 5: Identify the vibhakti value and store it as target value
Step 6: Store the information in the file
Output is kAraka value of the word stored in a fil. Each case or kArka is defined as follows: 1
for kartA,2 for karma,3 for karan,4 for sampradan,5 for apAdAn and 7 for adhikaran. Number 6 is
used for sambandh kAraka which is not included in the pAninI class.
The encoded data is used for training network with back propagation using 200 sentences. Network
is trained using resilient back propagation algorithm for data set and the result is shown in the Figure 4.
BPN data set for encoded input/output values were used and system performance is calculated.
As shown in the figure system was trained in less than 400 epochs.
CONCLUSION
Experiments for kAraka based semantic classification using neural networks were conducted as
per algorithms discussed in previous section. Tagged data of 200 sentences were created as shown
Table 3. Feature selected
Parameter
Feature type
Prefix and suffix of noun and verb
Morphological
Word category
Feature based on syntax
Verb root
Lexical feature
Successive and previous words
Context based feature
Probability occurrence of sufix
Feature based on corpus
9
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 3. (a) Anotated data (b) encoded values
in the Figure 1 Out of 200 sentences, 80% of the data was used in training the network with back
propagation and 20% of the data was used for testing. Back propagation algorithm with gradient
descent and resilient learning is used to train the three-layered network where system was trained
in 500-800 epochs. Performance of resilient BP algorithm has shown better results over gradient
descent. Data was tested with different learning rate as shown in the Table 4 where n = number of
epochs and lr = learning rate
Data at the serial number 2 shows the best performance with time, otherwise as the number of
neuron increases, training time increases. Time is also dependent on the learning rate, higher the
learning rate, faster the system and vice versa. Trained NN was tested on the 20 sentences. Out of
which the identification of each class is shown in the Table 5. Using resilient BPN algorithm, best
output is obtained with 200, 250 300 neurons and 0.01 learning rate. Using this data, the system was
further tested, and the output was classified for the various semantic classes. Accuracy of the test
was measured by F-measure given in Equation 11:
10
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 4. Training of the NN for semantic classification
Table 4. Trained data with different learning rate
pAninI class
f
n
c
p
r
F
Karta
20
18
17
0.940
0.850
0.890
Karma
15
12
11
0.916
0.730
0.812
Karan
15
13
10
0.760
0.660
0.706
sampradAn
13
12
11
0.916
0.846
0.879
apAdAn
13
12
10
0.833
0.769
0.790
Adhikaran
12
11
10
0.909
0.833
0.869
Where p is precision; r is recall and F is F score.p and r are calculated as follows:
p = Number of accurate result /total number of returned result
r = Number of exact result / Number of results to be returned.
F score is calculated as weighted average of precision and recall using Equation (11). F-Score of each class under NN is given in Table 5.
f = frequency of occurrence of each class in corpus
n = total count of a particular class in data
c = number correctly identified by the system
F = 2 (p × r ) / (p + r )
(11)
Average performance of the system with respect to F-Score is 0.8243. Test data with 40 sentences
were used for Case Frame generation out of which 22 were generated with 100% accuracy, 12 with
accuracy between 50-100% and remaining with accuracy less than 50% thereby giving an overall
11
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Table 5. Classification of pAninI for test data
S. No
Epochs
n
Lr
Time
(in sec)
Performance
1
460
200
0.01
0.009981
11.20
2
322
200
0.03
0.009980
7.75
3
507
200
0.05
0.009980
12.43
4
462
250
0.01
0.009980
18.62
5
454
250
0.03
0.009990
17.72
6
270
250
0.05
0.009980
10.59
7
243
300
0.05
0..009996
11.32
8
382
300
0.03
0.009995
17.66
9
247
300
0.01
0.009985
11.81
performance of 72.5%. F_measure presents weighted score of the individual kAraka value and overall
accuracy is measured degree of correctness achieved in each CF generation.
Statistical tools are commonly used in NLP applications like POS tagging, semantic classification,
word sense disambiguation etc. They give an accuracy of 90% and above.
. Due to inherent complexities and ambiguities in languages, Syntactic and semantic classifications
become a great challenge. Collection of tagged corpus with wide coverage of all scenarios plays an
important role in training NLP system. Statistical tool, supported by large set of tagged corpus, has
given accuracy of 90% and above. All languages are not rich in resources. Tagged corpus rarely
cover all scenarios or conflicting cases. Processing of Sanskrit, an ancient language, with high
deep rooted morphological structure but with scarce resources, was a great challenge. At the same
time applications of neural networks for classification purpose under sparse data had proved to be a
better tool. Hence, experimental set up was built to analyze the semantic classification after training
NN for these classes under pAninI framework. NN has proved to be an efficient tool for semantic
classification of the Sanskrit words.
This work can further be enhanced for automatic feature extraction of each word. As structure
of the basic word in Sanskrit contain prefix, root word and suffix; an algorithm for breaking word
into its constituent and mapping it into feature set can be developed. Incorporation of deep learning
in classification can be experimented for further improvement of the system.
12
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
REFERENCES
Babu, A., Suresh, K., & Pavan, P. N. V. S. (2010). Comparing neural network approach with n-gram approach
for text categorization. International Journal on Computer Science and Engineering, 2(1), 80–83.
Bharati, A., Kulkarni, A., & Sivaja, S. N. (2008). Use of Amarako’sa and Hindi WordNet in Building a Network
of Sanskrit Words. In 6th International Conference on Natural Language Processing ICON ’08. India: Macmillan
Publishers.
Bhattacharyya, P. (2010, May). Indowordnet. In lexical resource engineering conference, LREC Malta,
Briggs, R. (1985). Knowledge Representation in Sanskrit and Artificial Intelligence. AI Magazine, 6(1), 32–39.
Chaitanya, V., & Kulkarni, A. (2015). Sanskrit and Computational Linguistics. In 16th World Sanskrit Conference,
28 June – 2 July , Silpakorn University, Bangkok.
Dwivedi, K. (2002). Prarambhik RachanAnuvAdakumaudi. VishavavidyAlaya PrakAshan. Varanasi (19th ed.).
Hellwig, O. (2011). Performance of a lexical and POS tagger for Sanskrit. In Proceedings of fourth international
Sanskrit computational linguistic symposium (pp. 162-172).
Huet, G. (2003). Towards Computational Processing of Sanskrit. Recent advances in Natural Language Processing.
In Proceedings in International conference ICON.
Hyman, D. M. (2009). From pAninian sandhi to finite state calculus. In Sanskrit Computational Linguistics.
Springer-Verlag. doi:10.1007/978-3-642-00155-0_10
Jha, G. N., Gopal, M., & Mishra, D. (2009, November). Annotating Sanskrit corpus: adapting IL-POSTS. In
Language and Technology Conference (pp. 371-379). Springer, Berlin, Heidelberg.
Joon, R., Gupta, V., Dua, C. & Kaur, G. (2017). Nltk based processing of sanskrit text. International Journal
of Management and Applied Science, 3(8).
Kak, S. C. (1987). The panian approach to natural language processing. International Journal of Approximate
Reasoning, 1(1), 117–130.
Kiparsky, P. (2002). On the architecture of Panini’s grammar. In International Conference on the Architecture
of Grammar.
Kumar, D., & Josan, G. (2010). Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey.
International Journal of Computers and Applications, 6(5), 32–41. doi:10.5120/540-704
Mittal, V. (2010). Automatic Sanskrit Segmentizer Using Finite State Transducers. In Proceedings of the ACL
2010 (pp. 85-90).
Bharati, A., Chaitanya, V., Sangal, R., & Ramakrishnamacharyulu, K. V. (1994). Natural language processing:
A Paninian perspective. PHI New Delhi.
Patel, D. & Katuri, S. (2015). Prakriyāpradarśinī - an open source subanta generator Sanskrit and Computational
Linguistics. In 16th World Sanskrit Conference, 28 June – 2 July, Silpakorn University, Bangkok.
Pedersen, M., Eades, D., Amin, S. K. & Prakash, L. (2004). Relative Clauses in Hindi and Arabic: A Paninian
Dependency Grammar Analysis.
Rao, B. N. (2005). Panini and Computer Science – into the future with knowledge from past.
Selot, S., & Singh, J. (2007). Knowledge representation and Information Retrieval in PAninI Grammar Framework.
In International Conference ICSCIS ’07 (Vol. 2, pp. 45-51).
Selot, S., Tripathi, N., & Zadgaonkar, A. (2010). Semantic Extraction in PGF using POS tagged data for Sanskrit.
i-manager’s Journal on Software Engineering, 5(2).
Selot, S., Tripathi, N., & Zadgaonkar, A. S. (2009). Transition network for processing of Sanskrit text for
identification of case endings. ICFAI Journal of Computational Science, 3(4), 32–38.
Shan, H. & Gildea, D. (2004). Semantic labelling by Maximum entropy model. University of Rochester.
13
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Tasharofi, S., Raja, F., Oroumchian, F., & Rahgozar, M. (2008). Evaluation of statistical part of speech tagging
of Persian text. In 9th International symposium on signal processing and its applications.
Timothy, J. D., Hauser, M., & Tecumseh, W. (2005). Using mathematical models of language experimentally.
Trends in Cognitive Sciences, 9(6), 284–289. doi:10.1016/j.tics.2005.04.011 PMID:15925807
Vaidya, A., Husain, S., Mannem, P., & Misra, D. S. (2009). A Karaka Based Annotation Scheme for English.
In International Conference on Intelligent Text Processing and Computational Linguistics, LNCS (Vol. 5449,
pp. 41-52). Springer.
Smita Selot completed master’s in computer applications with Honors from Govt. Engineering College, Jabalpur, and
obtained Doctorate Degree from CSVTU, Bhilai. Major research interest natural language processing, knowledge
representation and image processing. Has published 35 papers in national and international journals and research.
14
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Client-Awareness Resource Allotment and
Job Scheduling in Heterogeneous Cloud
by Using Social Group Optimization
Phani Praveen S., Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore, India
K. Thirupathi Rao, Computer Science and Engineering, K L Deemed to be University, Guntur, India
ABSTRACT
Often cloud providers and cloud clients illustrate several constraints and thus allocation of resources
in a heterogeneous cloud is a difficult job. As the traffic flow is quite subjective and Client necessities
and applications size vary regularly, the major challenge and concern is to map the external job
requests to available virtual machines. To reduce the gap among regularly altering client requirements
and existing resources, Client-Awareness Allocation of Resources and Scheduling of jobs in cloud
by using social group optimization (SGOCARAJS) is proposed. This algorithm is mainly split into
two phases namely allocation of resources using SGO and shortest job first scheduling. The main
aim is to map the jobs to virtual machines of cloud group to attain higher client satisfaction and
lowest makespan time. Experiments are conducted on datasets and results are compared with present
scheduling techniques. This model proved that this algorithm outrun the available algorithms based
on concerned metrics.
Keywords
Heterogeneous Cloud, Job Scheduling, SGO, Virtual Machine
INTRODUCTION
Heterogeneous cloud platform is a scientific model, which gives resources with different functionalities
to clients. Every external request gets one job- id subsequent to recording in the heterogeneous cloud
group. Cloud computing is unavoidable. It is used in various aspects such as military, health, IT
industry operations and so forth. The different features of cloud computing like pay per use pricing
strategy, on-demand services and dynamic scaling makes cloud environment interesting for research
society and companies of all capabilities. The cloud amenities are provided as storage data centre.
Infrastructure as a Service uses scheduling strategy to assign virtual machines to client requests. FIFO
scheduling is used by Amazon Web Services with batch processing. The accomplishment of multi
cloud depends on the balancing load, effective scheduling and imperatively teaming up among peer
cloud specialist organizations to shape a consistent group to give complex problem-solving procedures
taking care of everyday business, logical and designing applications.
DOI: 10.4018/IJNCR.2018010102

Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

15
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
No data centre offers infinite resources for using dynamic scaling. A client with high challenging
applications takes many instances to obtain the well-timed deployment. The heterogeneous clouds
have many difficulties. In cloud environment, optimising various resources which are spread in various
locations is difficult. A centralised administration is recognized to attain allotment of resources
followed by scheduling of tasks. Cloud agent displays the procedures for scheduling and type of virtual
machines. The major issue which is opted from research, business and academia is allotting existing
jobs to virtual machines. Allocation of virtual machines involves two-phases, i.e. scheduling and
mapping. Mapping involves assigning the external tasks to processing units which are called virtual
machines and placing the order of assigned jobs. In this paper, SGOCARAJS have been proposed. The
proposed method has been classified into two essential phases known as allocation of resources and
task scheduling. The tentative outcome shows the proposed model which can overtake the available
algorithms in provisions of makespan time and rate of customer satisfaction.
LITERATURE REVIEW
Allocation of resources is allotment of jobs to resources with particular objectives such as less
execution time, low cost and balancing the load. The main aim is to assign resources to external
jobs in such a way that customer can be satisfied with less makespan time and high throughput. The
most important factor in allocation of resources is balancing load and it aims to allot the job requests
to resources so that executing units are either idle or overfilled. Frameworks are considered from
heterogeneous cloud group. Our model contains many physical devices which have been divided into
several virtual machines. A heterogeneous cloud consists of physical devices owned by several cloud
service providers that are beneath single group, considered simulation. The three major components
in our model are: cloud user, provider and the heterogeneous cloud platform. The heterogeneous
cloud has various physical machines whereas cloud users use resources to deploy applications. For
simulation, external requests are generated using Poisson’s distribution. Applications are of different
capacities. Every application splits into various jobs. Tasks will be allotted to virtual machines (VM).
Scheduling the tasks is a huge significance which relates to performance of cloud platform.
It mainly decides series of tasks executed by virtual machines. So, scheduling and balancing the
load are the techniques based on different phases of abstraction.
Allotment of resources includes the allocation of existing resources to virtual machines in an ideal
way and limiting the makespan time. Several requests will be assigned to particular virtual machine
whereas after ideal allotment of resources the system performance increases rapidly by scheduling
the tasks effectively.
The most important factor for the system behaviour is the prioritised job-requests execution. The
most difficult problem in distributed computing is pairing of scheduled cloud jobs (Armstrong et al.,
1998; Pandey et al., 2010; Cao et al., 2009; Li et al., 2011; Praveen et al., 2017).
Users’ requirements change dynamically. Several existing techniques become unsuccessful, as
the users’ requirements and working platforms changes. Algorithms related to scheduling of jobs
which depend upon varying user requirements are broadly considered by several authors. The rent
management model known as Haizea is used to allow customers to choose the two types of rents that
are defined (Sotomayor et al., 2014; Satapathy, Mandal et al., 2016) Jobs are mostly classified as
reserved and unreserved jobs. Resources that can be reserved by jobs earlier are known as reserved jobs
that are non -preventive in the whole environment. The resource constraints given at the registration
time are the application type, Application processing power necessary for the execution and resource
period required for execution. The details of requirements are mentioned in SLA between the user
and cloud provider.
Researchers (Armstrong et al., 1998) allotted resources to jobs and estimated runtime of external
load if arrival rate is unsure, recommended OLB and LBA for determining the efficiency of multi
cloud. Researchers (Satapathy et al., 2016; Le, Nguyen, Bhateja & Satapathy, 2017) used Analyst
16
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Cloud emulator for deployment and SGO is used to allocate the load equally and assign jobs to virtual
machines. Parameters used in this experiment are: equal priority for all jobs, a physical device and
linear environment of external jobs. Major issue is to give equal priority for jobs which may direct
inappropriate resource usage. The applications which are in least priority are executed first whereas
high priority applications are kept waiting for long time. It is inappropriate resource usage. Researchers
(Fang et al., 2010) proposed two phases of mapping in cloud. First phase includes job mapping to virtual
machine whereas the second phase includes mapping of virtual machine to server. They introduced
the formula to measure balancing of load from the estimated load. Researchers (Jang et al., 2012)
used GA for allocation of jobs and scheduling. They classified cloud into several categories like high
priority, medium priority and least priority. Researchers (Javanmardi et al., 2014) used hybrid method
by using fuzzy theory and GA for allotment of resources in the cloud platform. Scientific researchers
(Chandrasekaran et al., 2013) used GA for balancing the load of resources. Researchers (Dam et al.,
2015) used Gravitational emulation and GA for balancing of load problem among various VMs. The
proposed algorithm tries to reduce the makespan time. Researchers (Panda et al., 2015) proposed
three scheduling algorithms i.e., CMMN, MEMAX and MCC, and for heterogeneous cloud platform.
The main goal is minimising makespan and improving cloud usage, but customer satisfaction rate
should be unchanged.
Authors used enhanced DEA to optimize scheduling of jobs processed by allotment of resources
based on time and cost, Taguchi model involves (execution, waiting, receiving) time and (receiving
and processing) cost (Tsai et al., 2013). Some Authors imposed the approach to calculate Pareto total
cost and makespan (Chand et al., 2011; Chand et al., 2014). In order to map the application to cloud
resources based on computation cost and transmission cost SGO is used.
Authors with Map reduce includes 3 phases: comp, group and part (Kolb et al., 2012). Load
balancer is recommended by many authors in between map reduce phases to deal with overfilling
(Al Nuaimi et al., 2012; Sudeepa et al., 2014; Gunarathne et al., 2010; Singh et al., 2013). In cloud,
several authors proposed SGO mapping (Satapathy et al., 2016) but never used in multi-cloud.
Authors have not proposed scheduling after mapping. Researchers used past data and present states
to calculate early and genetic algorithm scheduling strategy. The best solution has high control and
less cost on the system (Hu et al., 2010).
Authors proposed ant technique to gather data about position of cloud module before allotting
tasks to virtual machines (Nishant et al., 2012; Zhang et al., 2010). A Method of Load balancing
by using ACNT in OCCF was also proposed (Zhang et al., 2010). Mostly, ant technique is used in
identifying the less utilised node to assign external jobs so as to attain load balancing (Yadav et al.,
2018; Bhateja et al., 2016).
WLC algorithm was proposed for the existing jobs that were allotted to VMs of clouds with less
number of networks (Ren et al., 2011; Singh et al., 2014; Singh et al., 2014)
This model consists of SGO-based resource allotment and SJF scheduling. Its main goal is to
assign resources effectively in order to reduce makespan time and improve customer satisfaction.
Authors (Panda et al., 2016; Bhateja et al., 2016; Dey et al., 2014) defined a connection between
customer satisfaction rate and makespan time. But they ignored the waiting time.
If many reserved jobs exist, then best provisioning occurs before its execution as these requests
are registered. Provisioning is more dynamic in unreserved jobs. There isn’t much time for cloud
federation for auto scaling. A connection between job type, makespan time, waiting and processing
time with customer satisfaction rate has been presented.
HETEROGENEOUS CLOUD ARCHITECTURE
To offer services to many users, several data centres are connected. Cloud customers can send
requirements to the cloud agent. Then the resource allotments to jobs are completed with available
17
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
resources. Basically, the resources are restricted, so processing capacities of resources are also
restricted. As a result, external load will be dynamic.
The main aim of heterogeneous cloud is to implement many applications given by customers,
trying to reduce waiting and makespan time. The server in heterogeneous cloud is high, compared
with a particular cloud service provider. Every application splits into many autonomous jobs. These
jobs are assigned to virtual machines. In heterogeneous cloud, assigning virtual machines to tasks is
difficult. Scheduling of tasks with different configuration and different SLA are challenging. Figure 1
consists of pictorial representation of heterogeneous cloud. It consists of many cloud providers. Cloud
agent is accountable for allotting distinct id for all external requests after service level agreement is
communicated between customer and service provider. External applications are categorized into
various jobs. Likewise, physical devices split into many virtual machines. SGO is used to identify
the most excellent job-virtual machine pair, to be processed. SGO is a best suitable novel technique
for load balancing. After resource allotment, it is observed that various jobs are allotted to particular
virtual machine. We used FIFO and SJF algorithms. If several tasks are assigned to particular virtual
machine, then the tasks with short burst time will be processed earliest. In SGO resource allotment
technique it is observed that balancing of load, utilisation of resources is best and implementing it by
SJF, makespan time is reduced. Figure 1 shows the model supporting SGO based mapping of tasks
to balance load equally among available VMs followed by SJF scheduling.
A new customer of cloud wants to implement applications in cloud, as several providers exist
in world. A new customer of cloud depends on chronological performances of service providers for
validating its tendency and reliability. There will be connection among customer satisfaction with
estimated waiting time, time of completion and type of job. Service provider with best rate of user
satisfaction can survive for a long time in market.
The efficiency of cloud providers, physical machines, and virtual machines are registered in
historical user feedback record. Major components of cloud are allied to cloud agent such that quality
of service should be maintained according to service level agreement.
EXAMPLE
Let Cld be a group of service providers, where Cld=[Cld1, Cld2, …, Cldi] There are set of A applications,
where A= {A1, A2, …, Aj}. Every cloud client has multiple job-requests.
Every application is split into several independent tasks, such that Aij={A11, A12, …, Aq1, Aq2,…, Aqi
} is the set of tasks and Cldij={Cld11, Cld12, …, Cldp1, Cldp2,…, Cldpi } is the set of virtual machines.
Mapping function M: Aij→ Cldij. The major goal is to map tasks of user to virtual machines so that
most of the customer jobs will be implemented in less period of time and maximum satisfaction
of customer can be guaranteed. Here job-requests are classified into two types: Reserved jobs and
Figure 1. Representation of Heterogeneous-cloud group
18
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
unreserved jobs. Unreserved jobs are preemptive but reserved jobs are non-preemptive. Reserved jobs
priority is more than unreserved jobs and service charge for reserved job is more than unreserved job.
In our model, there are 4 roles likely clients, cloud applications, owner of application and
heterogeneous cloud group. The application is implemented by owner in cloud platform and offers
services to the clients. Job requests are submitted by users with requirements of performance to the
application. The owner of application rents resources from cloud service provider for processing
external requests and the applications. The main aim is for mechanizing the procedure of provisioning
of resources in the heterogeneous cloud group.
All external requests are executed using minimum physical resources and virtual machines using
the auto-scaling technique.
Consider the following two applications in Figure 2.
Based on domain knowledge an application of cloud is composed of a group of services with
certain rules. An abstraction of a component is known as service unit of an application, such as
paying bills from an online bill desk, booking tickets and applications that are service oriented so
on using the web.
Every application is categorised into many autonomous jobs. Applications of two types are
considered in our method, i.e., reserved and unreserved applications.
The cloud group offers various kinds of virtual machines which are suitable for several loads.
The virtual machines have many processing capacities and service costs. Let us assume that
every virtual machine is capable of executing all requests.
The incoming of jobs are unknown to Application owner in advance. Hence, the load is
illustrated as all the incoming jobs submitted through heterogeneous cloud that can be processed
by application environment.
Figure 2. Direct acyclic graph of applications; (a) Application 1, (b) Application 2
19
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
The auto-scaling method has to make two decisions:
1. Allocation of virtual machine for executing the tasks. To find the better virtual machine-job pair
with high throughput and less makespan time, SGO operators will be used;
2. Scheduling of jobs is the sequence of jobs which will be processed during allotment of more jobs
to a particular virtual machine. We use conventional SJF scheduling for increasing the throughput
and reducing the makespan time.
The main aim is to identify a plan for allotment of resources and scheduling of tasks to improve
the satisfaction rate of customer. In the market, several service providers exist.
Customer depends on previous reviews on that particular service provider. To achieve high rate
of customer satisfaction, factors to be considered are waiting time, makespan time and throughput.
Reserved jobs are in waiting state for resources which are under processing by the unreserved
jobs. In this situation, unreserved jobs are preempted and saved in provisional queue until execution
of completion of reserved job and allocation of resources to reserved job. In Present Scenario, Service
charge of reserved tasks is high when compared to unreserved tasks. Multiple jobs can be processed
simultaneously in cloud based on priority and with valid request-id. Usually service providers of
cloud follow sequential order. Providers have various execution and storage capabilities such as low,
medium, high. User classification is based on selection of customers and resource capacity. The
Mapping algorithm depends on SGO followed by SJF scheduling. We observed that several jobs are
allotted to particular virtual machine for implementation. A user can choose job type to be Reserved
or unreserved. Nature of task is decided by user based on pay for use.
SGOCARAJS TECHNIQUE
The proposed method consists of r clouds and maps p autonomous jobs to q virtual machines so that
every job is processed with least makespan time and utmost rate of user satisfaction.
Effective allotment of resources using SGO based mapping includes mapping the number of jobs
to virtual machines and scheduling of jobs using sjf technique. In our simulation, we used SGO-based
allotment of resources algorithm and compared with Teaching Learning Based Optimization (Rao
et al., 2013; Satapathy et al., 2016; Satapathy, Suresh Chandra et al., 2017) and Customer Oriented
Task Scheduling algorithm (Panda et al., 2016).
With the increase in external applications, Scheduling of jobs and complexity of allocation of
resources increases. In the heterogeneous cloud, there are various resources and applications which
are processed dynamically. Hence a strong optimization method is required for the application
scheduling. To satisfy Quality of Service, reduce makespan time and best usage of resources, SGO
is better suited algorithm.
Figure 3 shows the process of resource allocation where every client introduces applications
to the cloud group. It uses the proper techniques for resource allotment to the tasks by considering
optimization values like utilisation of resources, less makespan, and high satisfaction of customer.
Hence, the optimisation issue can be solved by the techniques like Ant Colony Optimisation and
Particle Swarm Optimisation. In this paper, the proposed allotment of resources algorithm in the
heterogeneous cloud depends on SGO with a few changes.
The Mapping method is used in allocation of resources. It includes detection of the virtual
machine-job pairs by reducing makespan time, achieving high rate of client satisfaction and balancing
the load between all available resources. As explore size is enormous, SGO is used for finding best
job pair with less makespan time and high rate of user satisfaction.
20
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 3. Allocation of resources in heterogeneous cloud group
SGO based Client-Awareness Allocation of Resources and Scheduling of jobs in cloud save
all incoming jobs by using a queue. Firstly, the queue is zero while valid jobs start impending, then
every application splits into many jobs.
This method selects a job to its particular virtual machine using algorithm based on SGO
(Satapathy, S et al., 2016). It identifies the requirements for unassigned jobs. The completion times of
other readily available jobs are compared to identify the best job-virtual machine pair with minimum
makespan time. Here the above-mentioned process is repeated until each and every one of unallotted
jobs is assigned to particular virtual machines.
Balancing of load can be represented as “x” jobs submitted by clients having “y” virtual machines
in heterogeneous cloud. Processing power of the virtual machine gives makespan time and resource
waiting time:
f(x) = minimum (makespan) + (1/max (cust_sat_rate)
(1)
makespan=f (MIPSprovisionaltask, EST)
(2)
makespan= w1*(NIC/MIPS) + w2*EST
(3)
cust_sat_rate=f (ESTprovisionaltask, PTCprovisionaltask)
(4)
Computability value known as makespan time of every executing unit represents the utilisation
resource rate in Equations 2, 3 and 4 shows the association among rate of user satisfaction, resource
waiting time, estimated completion time.
Million instructions exist in job is defined by NIC, million instructions per second represents
MIPS that is processed by that device and weights are given by w1, w2. It is difficult to select
the weights value as it changes from industry to other. w1 and w2 ranges between 0.2 and 0.8 so
that their sum is 1.
21
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Algorithm 1 for SGOCARAJS
Input i) A group of requests from users based on Poisson distribution
ii) A group of Tasks
iii) A group of service providers in heterogeneous cloud group
iv) A group of virtual machines
Output: i) Rate of consumer satisfaction
ii) Makespan time
While Qmain != Null
Set makespan = ‘0’
Every Job splits into several jobs
while termination condition satisfied do
mapping jobs in a particular group to clouds arbitrarily
Compute fitness function = minimum (makespan) + (1/max (cust_sat_rate)
Search for gbestg, in a group which is the finest mapping
for i = 1: cldp do
for j=1:Aq do
Mappingnewi j= c ∗ Mappingoldi j+ r − (gbest (j)Mappingoldi j)
for i = 1: cldp do
Arbitrarily select one Mapping Xr, where i != r
If f (Mappingi) < f (Mappingr) then
for j = 1: A do
Mappingnewi, j= Mappingoldi, j+ r1 ∗ Mappingi, j Mappingr, j +
r2 − (gbest j −Mappingi, j)
else
for j = 1: A do
Mappingnewi= Mappingoldi+ r1 ∗ Xr −Xi + r2 ∗ (gbest j − Mappingi j)
Accept Mappingnew if it gives a better fitness function value.
End while
Call Task Scheduling [PTC, EST,p,q,makespan] // PTC is described as Predicted time for compute.
End while.
Algorithm 2: ReservedJobLessMakeSpanTime(PTCReserved,PTi)
Start
while Qreserved != empty do
Qreserved ← Qtemp
for pc=i,ii,iii,....ai do // pc is described as provisional cloud
for pt= i,ii,iii,...ri do // pt is described as provisional task
Identify sort(pt(Qreserved))
minimum(pt=minimum(allocated tasks)
MakeSpan(pt,r)=PTC (pt,r)+ WT(pt) // WT is described as waiting time
End
Algorithm 3: Unreserved-JobMakeSpanTime(PTCUnreserved,pti)
Start
while Qunreserved != null do
Qunreserved ← Qtemp
for pc= i,ii,iii,....ai do
for pt= i,ii,iii,...ri do
Identify sort(pt(Qunreserved))
minimum(pt=minimum(allocated tasks)
MakeSpanTime(pt,r)=PTC(pt,r)+WT(pt)
End
22
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
In the procedure of Scheduling, allotted cloud job pair is to be scheduled consecutively such that
makespan time is less and rate of user satisfaction is more. In this method load is equally disseminated,
such that no server is overfilled. The algorithm SGO based client awareness allocation of resources
and scheduling of jobs is described.
The jobs are scheduled based on priority. Reserved jobs are executed prior to unreserved jobs.
By implementing the above process higher rate of user satisfaction can be attained. Balancing of job
has two merits:
1. It maximizes the rate of customer satisfaction;
2. It decreases the steps necessary to allot the jobs.
Representation of SGO Based Client-Awareness Allocation
of Resources and Scheduling of Jobs in Cloud
Let us consider there are 8 jobs (J1 to J8) mapped to 4 clouds (Cld1 to Cld4). A PTC matrix is generated,
as per estimated processing time for mapping process is represented in Table 1. The Table 2 contains
SGO mapping.
The sequence of Scheduling jobs and Gantt chart of SGO based User-Awareness Allocation
of Resources and Scheduling of jobs in cloud are presented in Table 3 and Table 4 in which ‘*’
represents the inactive time.
Table 1. PTC matrix
jobs/ Clouds
Cld1
Cld2
Cld3
Cld4
J1
20
80
60
40
J2
70
30
90
120
J3
80
90
50
100
J4
120
160
130
80
J5
130
170
140
60
J6
90
40
70
65
J7
90
110
130
120
J8
50
90
30
80
Table 2. Mapping Sequence for SGO based Client-Awareness Allocation of Resources and Scheduling of jobs in cloud
Job/ Cloud
J1
Cld1
Cld2
Cld3
Cld4
20
J2
30
J3
50
J4
80
J5
60
J6
J7
J8
40
90
30
23
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Table 3. Scheduling sequence for SGO based Client-Awareness Allocation of Resources and Scheduling of jobs in cloud
Job/Cloud
J1
Cld1
Cld2
Cld3
Finish time
Cld4
20
20
J2
30
30
J8
30
J6
30
40
40
J3
50
50
J5
60
60
J4
80
140
J7
90
Start Time
110
110
70
80
140
Table 4. Gantt chart for SGO based Client-Awareness Allocation of Resources and Scheduling of jobs in cloud
Job/Cloud
0-20
Cld1
Cld2
Cld3
20-40
J2
J8
40-60
J6
J3
60-80
80-100
J5
J7
100-120
120-140
Cld4
J1
*
J4
*
*
*
*
*
*
In the process of mapping, job J1is mapped to Cld1, job J2to Cld2, job J3to Cld3, J4to Cld5, J5 to
Cld4, J6 to Cld2, J7 toCld1 and J8 to Cld3.
Now the jobs J1 and J7 are allotted to Cld1, jobs J2and J6 are allotted to Cld2, jobs J3 and J8 allotted
to Cld3, jobs J4 and J5 allotted to Cld4.
After assigning of jobs to cloud, effective scheduling of jobs is done. We used sjf scheduling,
such that makespan time is reduced. The Gantt chart of SGO based Client-Awareness Allocation of
Resources and Scheduling of jobs in cloud shown in Table 4.
PERFORMANCE METRICS OF RATE OF USER SATISFACTION
The efficiency of the existing algorithm (Panda et al., 2016; Bhateja et al., 2016) is compared with
proposed method using efficiency parameters like makespan time and rate of user satisfaction.
The rate of user satisfaction is the feedback stated by clients for cloud service provider, concerning
application deployment is effectively done with in time and suitable pricing strategy. Let us consider
two algorithms SGO based Client-Awareness Allocation of Resources and Scheduling of Tasks in
cloud (X) and customer-oriented task scheduling (Y) to process 5 jobs, which are J1, J2, J3, J4, and J5.
The estimated time of completion for these jobs in algorithms X are 30, 50, 20, 60 and 100 whereas
using algorithm Y are 40, 30, 60, 80 and 70. Hence algorithm X has usersatisfactionrate (X) =3, as
X takes the initial time for jobs J1, J3 and J4. Whereas algorithm Y has user_satisfaction_rate (Y) =2,
because it takes the earliest time for jobs J2and J5.
24
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
The mathematical formula for measuring rate of client satisfaction is defined in the following
Equations 5, 6 and 7:
clientsatisfactionrate(A) = Σni=0 (PTCi) + Σni=0 (C) + w2 * Σni=0 (ESTi)
(5)
The Rate of client satisfaction depends on following factors:
1. Estimated completion time
2. C (customised pricing strategy)
3. Resource waiting time
If waiting time is higher, lower the rate of user satisfaction. W2 is a user specified constant ranges
between 0.02 to 0.06.
Several users depend on past feedback of service provider and user satisfaction is taken from
cloud customers while selecting a consistent service provider for an organization:
PTCi (Ji) = 1, if PTCx(Ji) < PTCy(Ji) or otherwise 0
(6)
Here, PTCX(Ji) and PTCY(Ji) show the predicted completion time of job Ji in algorithm X and
algorithm Y respectively:
Ci(Ji) = 1, if job is in reserved or 0 if job is in unreserved
(7)
Pricing policy of reserved jobs is high when compared with unreserved jobs (Sotomayor B et
al., 2014). Unreserved jobs are preemptive but reserved jobs are not preemptive. C is pricing strategy
and the rate of user satisfaction is reliant on low complete time.
EST is resource waiting time and w2 is a weight which ranges from 0.02 to 0.06. Many new
stakeholders depend on the past client satisfaction rate as selecting a reliable service provider for
an organization.
Result Analysis
For experiment, we used cloudsim version 3.0.3. We simulate using Intel Xeon E5-1603 CPU @
2.80GHz, x64 version and RAM of 8 GB running on Windows 10. The values of the information
used in the experiment are defined in Table 5.
The performance measures, rate of customer satisfaction, Makespan time are compared with
the three algorithms TLBO, COTS and SGOCARAJS in cloud. The makespan time and rate of user
satisfaction of the SGOCARAJS in cloud algorithm are measured for several values are compared
with customer oriented task scheduling (Al Nuaimi et al., 2012; Jagatheesan et al., 2015; Chatterjee
et al., 2016) algorithm and Teaching Learning Based Optimization algorithm in Table 6 and Table
7 respectively.
Table 5. Types and its values of datasets
Type
Values
number of clouds
4, 8, 12, 16, 20
number of jobs
128, 256,512,1024
Chunks of data
jobs * Virtual Machines
25
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Table 6. Client Satisfaction Rate comparison in TLBO, COTS and SGOCARAJS in cloud
Datasets
128*20
256*30
512*40
1024*50
Algorithms
Simulation (i)
Simulation (ii)
Simulation (iii)
COTS
3.180e + 02
3.121e + 02
2.352e + 02
TLBO
3.694e + 02
4.159e + 02
3.278e + 02
SGOCARAJS
4.159e + 02
4.783e + 02
3.869e + 02
COTS
3.196e + 02
4.387e + 02
4.299e + 02
TLBO
4.097e + 02
3.967e + 02
4.159e + 02
SGOCARAJS
4.283e + 02
4.555e + 02
4.963e + 02
COTS
5.891e + 02
4.914e + 02
4.998e + 02
TLBO
5.624e+ 02
5.589e+ 02
4.599e+ 02
SGOCARAJS
7.441e + 02
6.501e + 02
5.223e + 02
COTS
3.879e + 02
5.223e + 02
4.720e + 02
TLBO
3.909e + 02
5.334e + 02
4.995+ 02
SGOCARAJS
4.534e + 02
5.776e + 02
5.321e + 02
Table 7. Makespan time comparison in TLBO, COTS and SGOCARAJS in cloud
Datasets
128* 20
256* 30
512* 40
1024* 50
Algorithms
Simulation (i)
Simulation (ii)
Simulation (iii)
COTS
4.225e + 02
5.312e + 02
6.332e + 02
TLBO
4.110e + 02
5.220e + 02
6.1134e + 02
SGOCARAJS
3.512e + 02
4.567e + 02
4.591e + 02
COTS
6.229e + 02
5.567e + 02
6.742e + 02
TLBO
6.123e + 02
5.453e + 02
6.593e + 02
SGOCARAJS
5.899e + 02
4.764e + 02
5.899e + 02
COTS
5.914e + 02
4.823e + 02
5.876e + 02
TLBO
5.762e + 02
4.679e + 02
5.678e + 02
SGOCARAJS
5.329e + 02
4.239e +02
4.816e + 02
COTS
6.400e + 02
8.492e + 02
6.983e + 02
TLBO
6.378e + 02
8.390e + 02
6.679e + 02
SGOCARAJS
5.156e + 02
7.777e + 02
5.435e + 02
The result proves SGOCARAJS in cloud have maximum customer satisfaction rate in all datasets
(128*20, 256*30, 512*40 and 1024*50 in Figure 4(a), 4(b), 4(c) and 4(d) in comparison with customeroriented task scheduling and Teaching Learning Based Optimization.
The efficiency of the proposed algorithm based on makespan time is represented in Figure
4(e), 4 (f), 4 (g) and 4(h), showing lesser makespan time in SGOCARAJS in cloud than Customer
Oriented Task Scheduling and Teaching Learning Based Optimization algorithm. The efficiency
of COTS algorithm lies in between TLBO and SGOCARAJS in cloud algorithm in case of rate of
customer satisfaction and makespan time. Figure 5 shows comparison regarding makespan time
between COTS, SGOCARAJS in cloud and TLBO algorithms. Client satisfaction rate between COTS,
26
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 4. (a) dataset (128*20), (b) dataset (256*30); (c) dataset (512*40), (d) dataset (1024*50), (e) dataset (128*20), (f) dataset
(256*30), (g) dataset (512*40), (h) dataset (1024*50)
Figure 5. Comparison of makespan time
SGOCARAJS in cloud and TLBO algorithms are shown in Table 6. Figure 6 shows the comparison
of rate of customer satisfaction.
We produce the results of experiment for proposed algorithm with 2 performance parameters
likely rate of user satisfaction and makespan time. The proposed algorithm rate of user satisfaction is
measured for 20 instances on various chunks of data and compared with COTS and TLBO algorithms.
The results prove that the proposed algorithm SGOCARAJS outruns COTS and TLBO algorithms
for every instance.
27
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 6. Comparison of rate of customer satisfaction
CONCLUSION
A job scheduling algorithm which is known as SGOCARAJS in cloud is proposed. Customer
Oriented Task Scheduling aims to improve the client satisfaction. Teacher Learning Based
Optimization is used to identify the job-virtual machine pair with less makespan time whereas
SGOCARAJS algorithm focuses on reducing jobs makespan time and improve user satisfaction.
The algorithm basically includes two phases i.e., scheduling and mapping. Many instances are
processed in Cloud Sim with various chunks of data. The results proved that efficiency of the
proposed algorithm outruns available methods.
28
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
REFERENCES
Singh, D. P., Bhateja, V., Soni, S. K., & Shukla, A. K. (2014). A novel cluster head selection and routing
scheme for wireless sensor networks. In Advances in Signal Processing and Intelligent Recognition Systems
(pp. 403–415). Springer.
Al Nuaimi, K., Mohamed, N., Al Nuaimi, M., & Al-Jaroodi, J. (2012) A survey of load balancing in cloud
computing: Challenges and algorithms. In Second Symposium on Network Cloud Computing and Applications
(pp. 137-142). doi:10.1109/NCCA.2012.29
Armstrong, R., Hensgen, D., & Kidd, T. (1998). The relative performance of various mapping algorithms is
independent of sizable variances in run-time predictions. In Proceedings of Seventh Heterogeneous Computing
Workshop (pp. 79-87). doi:10.1109/HCW.1998.666547
Bhateja, V., Krishn, A., & Sahu, A. (2016) Medical image fusion in curvelet domain employing PCA and
maximum selection rule. In Proceedings of the Second International Conference on Computer and Communication
Technologies. New Delhi: Springer.
Bhateja, V., Sharma, A., Tripathi, A., Satapathy, S. C., & Le, D. N. (2016, December). An Optimized Anisotropic
Diffusion Approach for Despeckling of SAR Images. In Annual Convention of the Computer Society of India
(pp. 134-140). Springer, Singapore. doi:10.1007/978-981-10-3274-5_11
Bhateja, V., Tripathi, A., Sharma, A., Le, B. N., Satapathy, S. C., Nguyen, G. N., & Le, D. N. (2016, November).
Ant Colony Optimization Based Anisotropic Diffusion Approach for Despeckling of SAR Images. In International
Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making (pp. 389-396). Springer.
doi:10.1007/978-3-319-49046-5_33
Cao, Q., Wei, Z. B., & Gong, W. M. (2009) An optimized algorithm for task scheduling based on activity
based costing in cloud Computing. In 3rd IEEE International Conference on Bioinformatics and Biomedical
Engineering. doi:10.1109/ICBBE.2009.5162336
Chand, P., & Mohanty, J. R. (2011). Multi objective genetic approach for solving vehicle routing problem with
time window. In Trends in Computer Science (pp. 336–343). Springer Berlin Heidelberg. doi:10.1007/978-3642-24043-0_34
Chand, P., & Mohanty, J. R. (2014). Environmental multi objective uncertain transport trail model using
variant of predator prey evolutionary strategy. International Journal of Applied Decision Sciences, 8(1), 21–51.
doi:10.1504/IJADS.2015.066556
Chandrasekaran, K., & Divakarla, U. (2013). Load Balancing of Virtual Machine Resources in Cloud Using
Genetic Algorithm.
Chatterjee, S., Ghosh, S., Dawn, S., Hore, S., & Dey, N. (2016). Forest type classification: a hybrid NN-GA
model based approach. In Information systems design and intelligent applications (pp. 227–236). New Delhi:
Springer. doi:10.1007/978-81-322-2757-1_23
Dam, S., Mandal, G., Dasgupta, K., & Dutta, P. (2015) Genetic algorithm and gravitational emulation based hybrid
load balancing strategy in cloud computing. In Third International Conference on Computer, Communication,
Control and Information Technology (C3IT). doi:10.1109/C3IT.2015.7060176
Dey, N., Samanta, S., Chakraborty, S., Das, A., Chaudhuri, S. S., & Suri, J. S. (2014). Firefly algorithm
for optimization of scaling factors during embedding of manifold medical information: An application in
ophthalmology imaging. Journal of Medical Imaging and Health Informatics, 4(3), 384–394. doi:10.1166/
jmihi.2014.1265
Fang, Y., Wang, F., & Ge, J. (2010). A task scheduling algorithm based on load balancing in cloud Computing.
In International Conference on Web Information Systems and Mining (pp. 271-277). Springer Berlin Heidelberg.
doi:10.1007/978-3-642-16515-3_34
Gunarathne, T., Wu, T. L., Qiu, J., & Fox, G. (2010, November). MapReduce in the Clouds for Science. In
Second International Conference on Cloud Computing Technology and Science (pp. 565-572).
29
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Hu, J., Gu, J., Sun, G., & Zhao, T. (2010) A scheduling strategy on load balancing of virtual machine resources
in cloud computing environment. In Third International Symposium on Parallel Architectures, Algorithms and
Programming (pp. 89-96).
Jagatheesan, K., Anand, B., Dey, N., & Ashour, A. S. (2015). Artificial intelligence in performance analysis of
load frequency control in thermal-wind-hydro power systems. Int. J. Adv. Comput. Sci. Appl., 6(7).
Jang, S. H., Kim, T. Y., Kim, J. K., & Lee, J. S. (2012). The study of genetic algorithm-based task scheduling
for cloud computing. International Journal of Control and Automation, 5, 157–162.
Javanmardi, S., Shojafar, M., Amendola, D., Cordeschi, N., Liu, H., & Abraham, A. (2014) Hybrid job
scheduling algorithm for cloud computing environment. In Proceedings of the Fifth International Conference on
Innovations in Bio-Inspired Computing and Applications IBICA (pp. 43-52). Springer International Publishing.
doi:10.1007/978-3-319-08156-4_5
Kolb, L., Thor, A., & Rahm, E. (2012) Load balancing for mapreduce-based entity resolution. In 28th International
Conference on Data Engineering (pp. 618-629).
Le, D. N., Nguyen, G. N., Bhateja, V., & Satapathy, S. C. (2017). Optimizing feature selection in video-based
recognition using Max–Min Ant System for the online video contextual advertisement user-oriented system.
Journal of Computational Science, 21, 361–370. doi:10.1016/j.jocs.2016.10.016
Li, J. F., & Peng, J. (2011). Task scheduling algorithm based on improved genetic algorithm in cloud computing
environment. Jisuanji Yingyong, 31(1), 184–186. doi:10.3724/SP.J.1087.2011.00184
Nishant, K., Sharma, P., Krishna, V., Gupta, C., Singh, K. P., & Rastogi, R. (2012) Load balancing of nodes in
cloud using ant colony optimisation. In 14th International Conference on Computer Modelling and Simulation.
Panda, S. K., & Das, S. (2016). A Customer-Oriented Task Scheduling for Heterogeneous Multi-Cloud
Environment. International Journal of Cloud Applications and Computing, 6(4). doi:10.4018/IJCAC.2016100101
Panda, S. K., & Jana, P. K. (2015). Efficient task scheduling algorithms for heterogeneous multi-cloud
environment. The Journal of Supercomputing, 71(4), 1505–1533. doi:10.1007/s11227-014-1376-6
Pandey, S., Wu, L., Guru, S. M., & Buyya, R. (2010). A particle swarm optimisation-based heuristic for scheduling
workflow applications in cloud Computing environments. In 24th IEEE International Conference on Advanced
Information Networking and Applications (pp. 400-407).
Praveen, S. P., Rao, K. T., & Janakiramaiah, B. (2017). Effective allocation of resources and task scheduling in
cloud environment using social group optimization. Arabian Journal for Science and Engineering. doi:10.1007/
s13369-017-2926-z
Rao, R. V., & Kalyankar, V. D. (2013). Parameter optimisation of modern machining processes using teaching–
learning-based optimisation algorithm. Engineering Applications of Artificial Intelligence, 26(1), 524–531.
doi:10.1016/j.engappai.2012.06.007
Ren, X., Lin, R., & Zou, H. (2011). A dynamic load balancing strategy for cloud computing platform based on
exponential smoothing forecast. In International Conference on Cloud Computing and Intelligent Systems (pp.
220-224). doi:10.1109/CCIS.2011.6045063
S. C. Satapathy, V. Bhateja, & S. Das (Eds.). (2017). Smart Computing and Informatics: Proceedings of the
First International Conference on Sci 2016 (Vol. 1). Springer.
Satapathy, S. C., Bhateja, V., Raju, K. S., & Janakiramaiah, B. (2016). Computer Communication, Networking
and Internet Security. In Proceedings of IC3T.
Satapathy, S. C., Mandal, J. K., Udgata, S. K., & Bhateja, V. (2016). Information Systems Design and Intelligent
Applications. Springer India.
Satapathy, S. & Naik, A. (2016). Social group optimization (sgo): a new population evolutionary optimization
technique. Complex Intell. Syst., 2(3), 173–203.
Singh, D. P., Bhateja, V., & Soni, S. K. (2013). An efficient cluster-based routing protocol for WSNs using
time series prediction-based data reduction scheme. International Journal of Measurement Technologies and
Instrumentation Engineering, 3(3), 18–34. doi:10.4018/ijmtie.2013070102
30
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Singh, D. P., Bhateja, V., & Soni, S. K. (2014). Energy optimization in WSNs employing rolling grey model.
In International Conference on Signal Processing and Integrated Networks (SPIN). IEEE. doi:10.1109/
SPIN.2014.6777064
Singh, D. P., Bhateja, V., & Soni, S. K. (2014). Prolonging the lifetime of wireless sensor networks using prediction
based data reduction scheme. In International Conference on Signal Processing and Integrated Networks (SPIN).
IEEE. doi:10.1109/SPIN.2014.6776990
Sotomayor, B. (2014) Haizea. Computation Institute, University of Chicago. Retrieved from http://haizea.
cs.uchicago.edu/whatis.html
Sudeepa, R., & Guruprasad, H. S. (2014). Resource allocation in cloud computing. International Journal of
Modern Communication Technologies & Research, 2(4), 19–21.
Tsai, J. T., Fang, J. C., & Chou, J. H. (2013). Optimized task scheduling and resource allocation on cloud
computing environment using improved differential evolution algorithm. Computers & Operations Research,
40(12), 3045–3055. doi:10.1016/j.cor.2013.06.012
Yadav, P., Sharma, S., Tiwari, P., Dey, N., Ashour, A. S., & Nguyen, G. N. (2018). A Modified Hybrid Structure
for Next Generation Super High Speed Communication Using TDLTE and Wi-Max. In Internet of Things and Big
Data Analytics Toward Next-Generation Intelligence (pp. 525–549). Springer. doi:10.1007/978-3-319-60435-0_21
Zhang, Z., & Zhang, X. (2010) A load balancing mechanism based on ant colony and complex network theory in
open cloud computing federation. In 2nd International Conference on Industrial Mechatronics and Automation
(Vol. 2, pp. 240-243). doi:10.1109/ICINDMA.2010.5538385
31
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
VRoptBees:
A Bee-Inspired Framework for Solving
Vehicle Routing Problems
Thiago A.S. Masutti, Mackenzie Presbyterian University, Sao Paulo, Brazil
Leandro Nunes de Castro, Mackenzie Presbyterian University, Sao Paulo, Brazil
ABSTRACT
Combinatorial optimization problems are broadly studied in the literature. On the one hand, their
challenging characteristics, such as the constraints and number of potential solutions, inspires their
use to test new solution techniques. On the other hand, the practical application of these problems
provides support of daily tasks of people and companies. Vehicle routing problems constitute a wellknown class of combinatorial optimization problems, from which the Traveling Salesman Problem
(TSP) is one of the most elementary ones. TSP corresponds to finding the shortest route that visits
all cities within a path returning to the start city. Despite its simplicity, the difficulty in finding its
exact solution and its direct application in practical problems in multiple areas make it one of the
most studied problems in the literature. Algorithms inspired by biological phenomena are being
successfully applied to solve optimization tasks, mainly combinatorial optimization problems. Those
inspired by the collective behavior of insects produce good results for solving such problems. This
article proposes the VRoptBees, a framework inspired by honeybee behavior to tackle vehicle routing
problems. The framework provides a flexible and modular tool to easily build solutions to vehicle
routing problems. Together with the framework, two examples of implementation are described, one
to solve the TSP and the other to solve the Capacitated Vehicle Routing Problem (CVRP). Tests were
conducted with benchmark instances from the literature, showing competitive results.
Keywords
Bee-Inspired Algorithms, Optimization, Swarm Intelligence, Travelling Salesman Problem, Vehicle Routing Problem
1. INTRODUCTION
Optimization consists of determining the values of a set of variables that minimize or maximize a
given mathematical expression, satisfying all problem constraints (Cunha et al., 2012; Michalewicz &
Fogel, 2013). An intuitive way to solve a given optimization problem is to list all possible solutions,
evaluate them, and use the best solution found. However, depending on the characteristics of the
problem this approach, known as exhaustive search, is not efficient. The main drawback of using full
enumeration is that it becomes computationally impractical depending on the number of possible
DOI: 10.4018/IJNCR.2018010103

Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

32
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
solutions to the problem. This means that this exact solution approach would be valid only for simpler
problems, what hardly occurs in practical applications.
The Vehicle Routing Problem (VRP) corresponds to a class of combinatorial optimization
problems that seek for an optimal set of routes for a fleet of vehicles to traverse to attend a given
set of customers (Lawler, 1985; Laporte, 1992; Reinelt, 1991; da Cunha, 2000; Toth & Vigo, 2001;
Gutin & Punen, 2002; Laporte, 2009). As in most real cases the vehicles have a limited capacity, VRP
becomes CVRP, which stands for Capacitated Vehicle Routing Problem. In mathematical terms, the
CVRP can be described as follows. Given a number K of identical vehicles, each with a capacity Q, a
depot id, a set S = {i1, i2, i3, ..., in} with n customers, each with a demand qi (i = 1, 2, 3, …, n), and the
cost Cij (i,j = 1, 2, 3, ..., n + 1) of going from customer i to customer j and from each customer to the
depot, the CVRP consists of determining the permutation π of the elements in S so as to minimize:
nk −1



C
+
C
+
C

∑  id ,πk ,1 ∑ πk ,i ,πk ,i +1 πk ,nk ,id 
k =1 
i =1

K
(1)
subject to:
nk
∑q
i =1
πk , i
≤Q
∀k
(2)
where πk,i represents the city visited in order i in route k, and nk is the number of customers visited
by vehicle k. For a review of practical applications and solution algorithms the work of Laporte et
al. (2000) and Laporte (2009) are recommended.
If the vehicle capacity and the demand of each customer (city) constraints are removed, then we
have the Multiple Travelling Salesman Problem (MTSP), which is a particular case of the CVRP.
Given a number M of salesman, a depot id, a set S = {i1, i2, i3, ..., in} with n intermediate cities, the
cost Cij (i,j = 1, 2, 3, ..., n + 1) of going from one city to another and also from each city to the depot,
the MTSP consists of determining the set of permutations πm (m = 1, 2, 3, …, M) from the elements
in S so as to minimize:
nm −1



C
+
C
+
C
∑  id ,πm,1 ∑ πm ,i ,πm ,i +1 πm ,nm ,id 
m =1 
i =1

M
(3)
where πm,i represents the city visited in order i in route m, and nm is the number of intermediary cities
visited in route m. The MTSP already presents enough characteristics to represent practical problems,
such as school vehicle routing (Angel et al., 1972). Bektas (2006) presented other examples of practical
applications and algorithms for MTSP.
It is still possible to make another simplification in MTSP by reducing the number of salesman
to one, leading to the so-called Travelling Salesman Problem (TSP). Given a set of n cities and the
cost Cij (i,j = 1, 2, 3, …, n) of going from city i to city j, the TSP aims at determining a permutation
π of the cities that minimize:
n −1
∑ (C
i =1
πi πi +1
)+C
πn π1
(4)
33
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
TSP represents one of the elementary vehicle routing problems and, despite its simple description,
its exact solution becomes complex because of the computational cost required, being part of the
class of NP-hard problems (Johnson & McGeoch, 1997). Due to its academic importance and broad
practical application, TSP has been receiving great attention for more than 60 years (Lawler, 1985;
Laporte, 2009). As a result, well-consolidated TSP-oriented review works can be found in the literature
(Laporte, 1992; Johnson & McGeoch, 1997; Gutin & Punnen, 2002), which present formulations,
examples of practical applications and classical solution algorithms.
These combinatorial optimization problems present some features, such as the high number of
possible solutions (search space size) and number and type of constraints, which may prevent their
solution by exhaustive search methods in a feasible time, even if large amounts of computational
resources are employed. Therefore, instead of trying to find exact solutions, successful techniques
propose candidate solutions and consider the history of results in the search process, modifying these
solutions iteratively until a satisfactory solution is found. These methods, normally called heuristics,
exchange the guarantee of the exact solution for a satisfactory solution that can be obtained with an
acceptable computational cost (Cunha et al., 2012; Luke, 2013; Michalewicz & Fogel, 2013).
One form of designing heuristics for solving vehicle routing problems is by looking at nature
and abstracting the way it computes, what leads to an area called Natural Computing or NatureInspired Algorithms (de Castro, 2007; Yang, 2010; Zang et al., 2010). The goal of this research area
is to design useful algorithms (heuristics) capable of effectively solving complex problems. Within
natural computing there is a body of works that seek inspiration in social systems, particularly social
insects, to design such heuristics, constituting an area called Swarm Intelligence (SI) (Bonabeau et
al., 1999; Blum & Merkle, 2008). A distinguishing feature of SI systems is self-organization: with
the low-level interactions of the agents, the swarm is capable of providing global responses. Among
the most studied algorithms in this area one can mention Ant Colony Optimization (ACO) (Dorigo
et al., 2006), Particle Swarm Optimization (PSO) (Eberhart & Shi, 2001), and more recently, beeinspired algorithms (Karaboga & Akay, 2009; Bitam et al., 2010; Ruiz-Vanoye et al., 2012; Verma
& Kumar, 2013; Karaboga et al., 2014; Agarwal et al., 2016).
In (Maia et al., 2012, 2013) the authors proposed a bee-inspired algorithm, named optBees, to
solve continuous optimization problems, and in (Masutti & de Castro, 2016a) an extension of optBees
to solve the Travelling Salesman Problem, named TSPoptBees, was presented. TSPoptBees was the
algorithm that brought in the idea and need to design a structured high-level bio-inspired framework
for solving vehicle routing problems. This framework is named VRPoptBees and is the main subject
of the present paper. The framework is designed with fixed and variable parts so that the users know
what should be changed for a given type of problem to be solved.
The proposal of the VRoptBees framework includes a detailed description of how it works
(Section 2), followed by one implementation for the travelling salesman problem, named TSPoptBees
(Section 2.1), and another for the capacitated vehicle routing problem, named CVRPoptBees (Section
2.2). To assess the usefulness of the framework, its TSPoptBees and CVRPoptBees versions were
implemented using Matlab®, evaluated in terms of computational cost and quality of solutions (Section
3). The algorithms were tested using benchmark TSP and CVRP instances from the literature, and
their performance compared against the best-known solutions and also the performance of other beeinspired approaches. The paper is concluded in Section 4 with general comment about the framework
and the performance of its implementations.
2. VROPTBEES: A BEE-INSPIRED FRAMEWORK TO
SOLVE VEHICLE ROUTING PROBLEMS
Some social species of bees clearly present characteristics and principles related to swarm intelligence,
making them a good inspiration to optimization algorithms. Some of the main collective behaviors
of bee colonies during foraging that can be highlighted as inspiration for the design of algorithms
34
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
are (Maia et al., 2013): 1) bees dance to recruit nestmates to a food source; 2) bees adjust the
exploration and recovery of food according to the colony state; 3) bees exploit multiple food sources
simultaneously; 4) there is a positive linear relationship between the number of bees dancing and the
number of bees recruited; 5) recruitment continues until a threshold number of bees is reached; 6)
the quality of the food source influences the bee dance; and 7) all bees retire at some point in time,
meaning that bees stop recruiting other bees. For a generic review of bee inspired algorithms and
applications, refer to (Karaboga & Akay, 2009; Bitam et al., 2010; Verma & Kumar, 2013; Karaboga et
al., 2014; Agarwal et al., 2016). A more recent review emphasizing the use of bee-inspired algorithms
to solve vehicle routing problems was published by Masutti and de Castro (2017).
2.1. The VROptBees Framework: General Overview
VRoptBees is a framework inspired by the collective behavior of bees for solving vehicle routing
problems. The framework is sufficiently flexible to facilitate its applicability to a large number of
problems, mainly targeting those in the category of vehicle routing. To achieve this goal, VRoptBees
has fixed parts, which do not change depending on the problem to be solved, and variable parts, which
require changes using operators specific to the problem. The structure of the framework is presented
in the pseudocode of Figure 1, in which the steps shaded in gray have variable parts. Each step of
the framework is described in a following section and the variable parts are highlighted. For ease of
reading, the description of each step is divided into topics with title identical to that shown in the
pseudocode, including numbering:
1. Define the Encoding Scheme: In this step, one must define the structure of a bee, that is, how it
represents a solution to the vehicle routing problem being solved. This step is important because
all step operators that contain variable parts must be implemented following this representation.
2. Define the Cost Function: One must use the representation defined in the previous step to
define a function that calculates the cost of a solution. The cost can represent the most relevant
metric for the problem being solved, such as the distance traveled, the time of travel, the fuel
consumption, among others. The variable part to be implemented for this step is a function that
receives a bee (candidate solution) and determines the cost of the solution it represents.
3. Initialization: The initialization step consists of generating the initial swarm with nInit bees. This
can be done at random or using a set of route construction heuristics. This step is a variable part
and should be implemented considering the characteristics of the problem to be solved.
Figure 1. Pseudocode of the VRoptBees framework
35
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
4. Stopping Criterion: The stopping criterion is a fixed part of VRoptBees. The search process is
terminated when there is no improvement of the best solution during maxIt iterations.
5. Determine the Recruiters: At this stage, recruiter bees are probabilistically selected in such a
way that the lower the cost of their solution, the greater the likelihood of becoming a recruiter.
The probability pi of bee i becoming a recruiter is given by:


0, 9
 * (sc − sc ) + 0, 1
pi = 
max
i
scmax − scmin 
(5)
In which scmax and scmin represent, respectively, the highest and lowest costs among the solutions
represented by all bees, sci is the cost of the solution associated with bee I, and 0,1 is a value added
to avoid a zero-valued probability.
Each recruiting bee is then evaluated in ascending order according to the cost of its solution,
and the other recruiters who are within a radius ρ, defined as social inhibition radius, are no longer
considered as recruiters. In other words, given the distance di,j between two recruiting bees i and j, if
di,j < ρ, then bee j is no longer a recruiter, assuming that sci < scj.
The calculation of the distance between two bees is a variable part of the framework. It must be
implemented as a function that receives bees i and j, and returns the value di,j of the distance between
them. This distance can be calculated based on the distance between the bees (what can be called
genotype distance) or the distance between the quality of the solutions presented by the bees (what
can be called phenotypic distance).
6. Adjust the Number of Active Bees: In this step, the swarm size is updated by enabling or
disabling the bees when necessary. Let nr be the number of recruiters defined in the previous step,
nact the number of active bees in the current iteration and nm the desired number of non-recruiters
(recruited and scout) for each recruiter. Thus, the value nd = (nr + 1)*nm determines the expected
number of active bees at each iteration. If nd > nact, then nad = nd − nact is the number of bees that
have to become active. If nd < nact, then nad = nact − nd is the required number of bees that must
be inactive. By deactivating a bee, the one associated with the most costly solution is removed
from the swarm. When necessary, nad bees are removed from the swarm using this criterion.
The way a bee is activated is a variable part in the framework. This should be implemented as a
function that creates nad bees using the desired method.
7. Evaluate the Quality of the Food Sources: The quality of the food source is related to the cost
of the route associated with this bee. As the algorithm seeks to maximize the quality of food
sources, the cost of the solution cannot be directly used as a quality factor. In this step, a function
is used to transform the cost of all solutions into a quality factor such that the lower the cost, the
higher the quality. The quality qi of the solution associated with bee i is calculated as:
qi = 1 −
sci − scmin
scmax − scmin
(6)
In which sci is the cost of the solution associated with bee i, and scmin and scmax are, respectively,
the lowest and the highest costs among all bees in the swarm. This is a fixed step in the framework.
8. Determine the Recruited and Scout Bees: In this stage the recruited and scout bees are
determined. In addition, the recruiters for each recruited are also defined, and the number of
recruited per recruiter is proportional to the quality of the recruiter’s food source. Let nnr =
36
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
nact − nr be the number of non-recruiting bees. The number of bees recruited at each iteration
is defined by:
n rtd = round (prec * n nr )
(7)
In which round (·) is a function that determines the nearest integer and prec is the percentage of
non-recruiters that will be recruited. Therefore, the number of scout bees is defined by nex = nnr − nrtd.
Let Qr be the sum of the qualities of the food sources of all recruiting bees. The number nr of
recruited for the recruiter bee i is defined as:

q 
nri = round n r ∗ i 

Qr 
(8)
With this value set, non-recruiting bees are associated with recruiters using a roulette selection
method. The remaining nex bees are then labeled as scout. This is a fixed step of the framework.
9. Perform the Recruitment: Recruitment consists of exploring new sources of food in the
neighborhood of the recruiters by attracting scouts. In VRoptBees, this is performed by
combining the solutions associated with the recruiter and the recruited, creating a new solution
in this neighborhood. The combination of two solutions is a variable part of this step, and can
be implemented as a set of operators inspired by the recombination mechanisms of evolutionary
algorithms (Davis, 1999). After a new solution is created, it will be used by the recruited bee
replacing its current solution only if there is an improvement in quality. Furthermore, if the
new recruited solution is better than its recruiter, this solution is marked for the local search
step, which will be described later. In summary, this step has a variable part that consists of
implementing one or more operators such as the recombination of evolutionary algorithms. Each
of the implemented operators must receive two solutions and create a new one.
10. Perform Exploration: At this stage the explorer bees look for new sources of food. The way to
create a new solution is a variable part in the framework, and can be implemented as a single or
a set of operators that create a new solution from scratch or modify existing solutions, such as
the evolutionary algorithm mutation operators.
11. Perform Local Search: The local search stage is used to improve promising candidate solutions.
This is a variable step in the framework and can be implemented with a single or a set of local
search algorithms. As described in the recruitment stage, only a few bees are selected for the
local search, which occurs when, during recruitment, the new recruitment solution is of a better
quality than that of their recruiter.
In the following sections implementations of VRoptBees variable parts are discussed to solve two
vehicle routing problems: 1) traveling salesman problem (TSP); and 2) capacitated vehicle routing
problems (CVRP).
2.2. VRoptBees for Solving the TSP
This section describes the necessary implementations of the variable parts of the VRoptBees
framework so that it can be applied to solve the traveling salesman problem. It should be noted that
the proposals to be presented here are examples of possible implementations for the TSP, but various
others could be used for the same problem. Only the steps with variable parts are highlighted and,
for ease of reading, the description of each step is separated as a topic with the same title as shown
in the pseudocode of Figure 1.
37
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
1. Define the Encoding Scheme: A solution is represented by a permutation of n integers, as
illustrated in Figure 2, where n is the number of cities of the TSP instance.
2. Define the Cost Function: For TSP, the cost of a solution is the distance traveled by the salesman.
Using the representation described above, the cost function is the one presented in Equation (4).
3. Initialization: The variable part of this step aims to generate nInit solutions for the bees in the
initial swarm, what can be performed in many ways, such as (Johnson & McGeoch, 1997):
a. An initial solution is generated using a Greedy Algorithm, which inserts edges into the
routes from the lowest to the highest cost ones, until all cities have been visited.
b. An initial solution is generated by the Cheapest Insertion Algorithm, in which cities are
added to the route in the position that implies the least cost.
c. A number of initial candidate solutions is generated using a Nearest Neighbor Algorithm, in
which the route starts from any city and repeatedly choses the next city based on its closeness
to the current one.
d. If necessary, the remaining initial solutions are created by randomly choosing already created
solutions and applying one of the exploration operators to be described later.
4. Determine the Recruiters: In this step, the variable part is a function that calculates the distance
between two bees. In the current implementation, this distance is calculated as a function of the
cost of the solutions (phenotype distance):
di , j =
abs (sci − sc j )
scavg
(9)
where abs (·) is the function that returns the absolute value of its argument and scavg is the mean cost
between solutions of all bees.
5. Adjust the Number of Active Bees: In this step, the aim of the variable part is to implement a
function that creates nad new bees that will be added to the swarm. In this implementation, this
is done in such a way that no existing bees are randomly chosen and, before including them in
the swarm, an exploration operator executes in each of them, generating new solutions.
6. Perform the Recruitment: The variable part in the recruitment step must implement one or
more recombination operators. For TSP, the following three operators were used:
Type 1: 50% to 80% of the solution associated with the recruited bee is copied as a single fragment
in the final solution, while the other cities are taken from the recruited bee.
Type 2: Similar to Type 1, but using all cities sequentially from the position of a randomly
chosen city.
Type 3: Similar to Type 1, but copies two distinct fragments from the recruiting bee.
For each pair of recruiting and recruited bees, a single one of the operators described above is
randomly chosen to be used, with equal probability.
7. Perform the Exploration: In this step, the variable part consists of implementing a single or a
set of operators to generate new candidate solutions, as follows:
Figure 2. Candidate solution representation for a 7-city TSP instance
38
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
◦◦
◦◦
◦◦
◦◦
Permutation of cities: two cities are randomly selected and their positions are changed.
City entry: a city is randomly selected and inserted into a new random location.
Subroutine inversion: two cities are randomly chosen and the sub-route between them
is inverted.
3opt motion: a 3opt motion is performed by randomly selecting the breaking and
joining points.
For each scout bee, a candidate solution is randomly chosen from the current swarm and goes
through one of the operators described above, chosen at random with equal probability.
8. Perform Local Search: For this implementation, the local search algorithm 2opt (Johnson &
McGeoch, 1997) is used. Each execution of 2opt is limited to a maximum of 2*n2 iterations,
in which n is the number of cities of the instance being solved. This number of iterations was
empirically defined as a balance between computational effort and solution quality.
2.3. VRoptBees for Solving the CVRP
This section describes some implementations required for the variable parts of VRoptBees to be applied
to the CVRP (Capacitated Vehicle Routing Problem). It should be noted that these are examples of
implementations for the CVRP, and others can be used for tackling the same problem.
1. Define the Encoding Scheme: Unlike the TSP, which has a single closed route, a solution for
the CVRP has multiple routes, one for each vehicle. In this implementation, a candidate solution
is represented by a structure with K integer permutation vectors, where K is the number of
vehicles used in the solution. Each vector represents the route of each vehicle. As it is assumed
that vehicles always start and end their route in the depot, the depot index does not appear in the
permutation vector, as illustrated in Figure 3.
2. Define the Cost Function: A standard formulation of the capacitated vehicle routing problem was
presented in Equation (1). As the representation of a solution and the operators used throughout
the search process allow a vehicle to exceed its maximum load capacity, this restriction enters
as a penalty in calculating the cost of the solution. The demand of customers visited by vehicle
k, demk, is given by:
Figure 3. Candidate solution representation for a CVRP instance with 3 vehicles and 10 customers
39
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
nk
demk = ∑q π
i =1
k ,i
(10)
In which q is the vector with the demands of all customers of the instance being solved.
With this, the calculation of the penalty to the solution cost, penk, of vehicle k is given by:
dem 
k
 ∗ sck , demk > Q

Q 
penk = 

x ≤Q
0,
(11)
where Q is the maximum capacity of the vehicle.
With these settings, the value sct of the total cost of the solution is:
K
sct = ∑ (sck + penk )
(12)
k =1
where K is the total number of vehicles used in the solution.
3. Initialization: The initial solutions are created with the same route creation algorithms used for
the TSP, with some additional steps at the end:
a. The CVRP instance is transformed into a TSP one, transforming the depot and the customers
into cities.
b. nInit TSP solutions for this temporary instance are created using the same approach described
for the TSP. See example in Figure 4(a) and (b).
c. For each solution, the depot is added K − 1 times at random positions, thus generating K
sub-routes. See the example in Figure 4(c).
d. The solution is then transformed into the representation used, creating the structure with K
vectors and removing the depot from them, as illustrated in Figure 4(d) and (e).
4. Determine the Recruiters: The variable part in this step is the calculation of the distance between
two bees. In this implementation, the same approach described for the TSP was used. This means
that the distance between two bees is calculated using the cost of the solutions associated with
them, according to Equation (9).
5. Adjust the Number of Active Bees: At this stage the same approach used for the TSP is
maintained, with the difference that the exploration operators are specific to the CVRP, as will
be explained further.
6. Perform the Recruitment: For the CVRP, three recruitment operators were implemented:
Type 1: This operator tries to maintain the order of visits to the customers using the sub-routes
as base. 50% to 80% of each sub-route of the solution associated with the recruiter bee is
copied into the final solution. The remaining customers are copied from the recruited bee’s
solution. In this case, the size of each sub-route in the final solution is the same as that of
the recruiter. See an example in Figure 5.
Type 2: This operator seeks to maintain the order of visits to the customers by temporarily treating
the solution as a single route. 50% to 80% of the recruiter solution is copied as a single
fragment into the final solution and the remaining customers are taken from the recruited
bee. In this case, the size of each sub-route in the final solution is the same as that of the
recruiter. See an example in Figure 6.
40
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 4. Examples of initialization schemes for a CVRP instance with 9 customers, 1 depot, and 3 vehicles
Figure 5. Example of Type 1 recruitment operator and its final solution with sz1 = 1, sz2 = 3, sz3 = 2 and sz4 = 2
Type 3: This operator seeks to maintain the order of visits to the customers using a sub-route
of the recruited bee as the base. A sub-route is chosen randomly from the recruited bee
and copied into the final solution. The remaining sub-routes are built using the recruiter’s
remaining customers, in order of occurrence. In this case, the size of each sub-route in the
final solution is the same as that of the recruited bee. See an example in Figure 7.
41
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 6. Example of Type 2 recruitment operator and its final solution
For each pair of recruiting and recruited bees, only one of the operators described above is
randomly chosen with equal probability.
7. Perform the Exploration: For the CVRP, the following exploration operators were implemented:
◦◦ Shuffle Customers in a Sub-Route: A sub-route and two positions are randomly chosen
and the customer order between these positions is shuffled, as illustrated in Figure 8(a).
◦◦ Insert a Customer into a New Sub-Route: Two sub-routes are randomly chosen, one
from the origin and another from the destination sub-route. A randomly chosen customer is
removed from the origin sub-route and inserted into the destination sub-route in a randomly
defined position, as illustrated in Figure 8(b).
◦◦ Insert a Customer Set into a New Sub-Route: This operator is similar to the previous one,
but a set of customers is moved from one sub-route to another. The size of this set is limited
to 50% of the originating sub-route customers. An example of the result of this operator is
shown in Figure 8(c).
◦◦ Switch Customers Between Sub-Routes: Two sub-routes are randomly chosen and in
each one a customer is chosen at random. These customers are then switched between the
sub-routes, as illustrated in Figure 8(d).
◦◦ Switching a Customer Set Between Sub-Routes: This operator is similar to the previous
one, but a set of customers is exchanged between the sub-routes. The size of this set is limited
to 50% of the customers of each of the sub-routes, as illustrated in Figure 8(e).
42
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 7. Example of Type 3 recruitment operator and its final solution with krd = 2
Figure 8. Exploration operators outcome examples using the same bee as a basis
43
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
For each scout bee, a solution is chosen randomly from the current swarm and its new solution
is created after going through one of the operators described above, chosen at random.
8. Perform Local Search: For the CVRP, two local search algorithms were implemented:
◦◦
◦◦
Hill Climbing: Hill climbing consists of iteratively changing a solution and using it as
the current solution only when an improvement occurs (Michalewicz & Fogel, 2013).
In this implementation, the exploration operators described earlier are used to make the
modifications to the current solution. The algorithm is limited to 1,000 iterations per run.
2opt: The same 2opt implementation for the TSP is run for all sub-routes of the selected
bee for the local search step.
For each bee selected for the local search during the exploration step, the two local search
algorithms are executed in sequence.
3. PERFORMANCE ASSESSMENT
This section describes the experiments made to assess the performance of VRoptBees for two classes
of problems: 1) the traveling salesman problem; and 2) the capacitated vehicle routing problem. For
the TSP, two test batteries were performed: 1) an assessment of the computational cost as a function
of the number of cities in the instance; and 2) an assessment of the quality of the solutions. For the
CVRP, two other test batteries were made: 1) an assessment of the computational cost in terms of
the number of customers; and 2) an assessment of the quality of the solutions.
All tests performed for the TSP used instances from the TSPLIB (Reinelt, 1991), which is a
library of benchmark TSP instances broadly used in the literature. For the symmetric TSP instances,
in which the cost of moving from city i to city j is the same as the one of moving from city j to i,
TSPLIB has 144 instances available with the optimal solution known, which is referenced here as
the Best Known Solution (BKS). In total, 28 instances, summarized in Table 1, ranging from 48 to
439 cities, were used to assess VRoptBees applied to the TSP.
To evaluate the performance of VRoptBees applied to the CVRP, we used benchmark instances
published in two works of the literature: Christofides and Eilon (1969), and Christofides et al. (1979).
These two libraries are commonly used to evaluate the quality of algorithms that solve CVRP. In total,
the two libraries have 21 instances, ranging in size from 13 to 200 customers, including the depot.
As for the TSPLIB, the instances of these libraries have the optimal solution known, also referred to
as the BKS. The instances used in the experiments are presented in Table 2.
The proposed framework was implemented in MATLAB and executed on an Intel Core i5 1.9
GHz with 8GB RAM. For each instance, the algorithm was executed 30 times. Based on the sensitivity
analysis of the TSP version of VRoptBees presented in Masutti & de Castro (2016b), the following
values were chosen for the VRoptBees parameters: nInit = n if n ≤ 100, or nInit = 100 if n > 100; nm
= 22; prec = 0.75; ρ = 0.1; and maxIt = 1,400.
3.1. Results for the Traveling Salesman Problem
The following are the tests performed with VRoptBees for the TSP. The presentation and discussion
of the results are segmented according to the purpose of the test, as previously described.
3.1.1. Assessing the Computational Cost Based on the Number of Customers
The computational cost analysis of the TSP version of VRoptBees, which will be named TSPoptBees,
was done empirically when executing the algorithm for two instances of TSPLIB: fl417 and pr439
were divided into another 20 instances, 10 for each original TSPLIB instance, varying in size according
44
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Table 1. TSP instances used to assess the performance of VRoptBees applied to the TSP
Instance
# of cities
BKS
Instance
# of cities
BKS
att48
48
10628
pr107
107
44303
eil51
51
426
pr124
124
59030
berlin52
52
7542
bier127
127
118282
st70
70
675
pr136
136
96772
eil76
76
538
kroA150
150
26524
pr76
76
108159
kroB150
150
26130
kroA100
100
21282
rat195
195
2323
kroB100
100
22141
kroA200
200
29368
kroC100
100
20749
kroB200
200
29437
kroD100
100
21294
tsp225
225
3916
kroE100
100
22068
a280
280
2579
rd100
100
7910
lin318
318
42029
eil101
101
629
fl417
417
11861
lin105
105
14379
pr439
439
107217
Table 2. CVRP instances used to assess the performance of VRoptBees applied to the CVRP
Instance
# of customers
# vehicles
Capacity
dem/cap
BKS
E-n22-k4
21
4
6000
0,94
375
E-n23-k3
22
3
4500
0,75
569
E-n30-k3
29
3
4500
0,94
534
E-n33-k4
32
4
8000
0,92
835
E-n51-k5
50
5
160
0,97
521
E-n76-k7
75
7
220
0,89
682
E-n76-k8
75
8
180
0,95
735
E-n76-k10
75
10
140
0,97
830
E-n76-k14
75
14
100
0,97
1021
E-n101-k8
100
8
200
0,91
815
E-n101-k14
100
14
112
0,93
1067
vrpnc4
150
12
200
0,93
1053
vrpnc5
199
17
200
0,94
1373
vrpnc11
120
7
200
0,98
1034
vrpnc12
100
10
200
0,91
820
45
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
to sets [25 50 75 100 150 200 250 300 400 417] and [25 50 75 100 150 200 250 300 400 439],
respectively. Using city sets from the same instance for this test presented more reliable results than
using different instances with different sizes, since each instance may have particular characteristics
that influence the convergence of the search process. Two different instances, but of similar sizes,
were used to reduce the bias due to the characteristics of a single instance.
For each instance described above TSPoptBees was run 30 times and only the characteristics
related to the computational effort of the algorithm were used in the analysis of the results. These
characteristics are: 1) the average number of iterations for convergence; 2) the mean size of the final
swarm; and 3) the average running time. It is worth mentioning that the obtained results are specific
to the implementation used, since each operator can present a different computational complexity.
Table 3 presents the results obtained for the two instances used. For each instance the average
results of the number of iterations for convergence of the search process, the number of active bees
Table 3. TSPoptBees computational cost evaluation for TSP. n is the number of cities in the instance; Iterations is the average
number of iterations for convergence; nact is the average number of bees in the final swarm; Time is the average time, in
seconds, for running the algorithm
fl417
n
Iterations
Time
nact
25
1402,63
170,13
14,98
50
1657,60
203,87
28,50
75
2034,80
253,73
55,55
100
2402,63
277,93
95,29
150
3110,63
311,67
195,43
200
4182,17
369,60
381,15
250
4927,83
426,07
617,87
300
5636,97
473,73
1006,08
400
6085,43
486,20
1726,61
417
5746,30
495,00
1638,32
435
-
-
pr439
n
Iterations
Time
nact
25
1402,63
207,53
19,10
50
1560,57
242,73
34,06
75
2170,07
259,60
63,78
100
2234,60
264,00
83,72
150
3692,37
287,47
200,83
200
4065,37
282,33
293,47
250
5299,03
300,67
474,90
300
4742,80
309,47
576,66
400
4821,03
321,20
961,55
417
-
-
-
435
7591,30
289,67
1719,78
46
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
in the final swarm and the execution time of the algorithm are presented. According to the results
presented the number of active bees in the final swarm tends to increase along with the growth of
the number of cities in the instance. However, growth seems to stabilize from a certain value of n.
This behavior is observed for the two instances used in this experiment. For the fl417 instance the
growth is more accentuated for smaller values of n, but it seems to stabilize around n = 300. For the
instance pr439 growth exists, however, in a milder fashion. An interesting point is that the value of
nact decreased for instance pr439 between n = 400 and n = 439 (actual instance size).
For the number of iterations for convergence (Iterations in Table 3) the behavior is also similar for
the two instances. The number of iterations for 25 ≤ n ≤ 250 is very close between the two instances,
presenting a marked increase. From this value there is a different behavior between the two instances.
For fl417 there is a smaller growth between n = 250 and n = 400, and a slight decrease for n = 417
(actual instance size). For pr439 there is a decrease in the number of iterations between n = 250 and
n = 300, almost constant between 300 and 400, and an abrupt increase between 400 and 439.
The runtime also has similar characteristics between the two instances and can be seen as a
composition of the two attributes described above. For the two instances the values are very similar
between n = 25 and n = 150, presenting a certain growth in the execution time. From this point there
is a more pronounced growth for the fl417 instance, which follows a behavior similar to that seen for
the number of bees. However, the runtime decreases between n = 400 and n = 417, the same behavior
seen for the number of iterations. For the pr439 instance there is a less pronounced growth, which
changes noticeably between n = 400 and n = 439, with an abrupt increase, as well as the observed
number of iterations.
It is evidenced by the results that, in relation to the computational effort, the algorithm can
present different values for different instances. However, what can be observed is a behavior trend
that is independent of the instance being solved. Figure 9 shows the runtime-related trend for the two
instances. It is observed that the running time can be approximated by a second-degree polynomial.
Figure 9. Running time of TSPoptBees for two TSP instances: (a) fl417 and (b) pr439
47
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Given the factorial nature of the problem, with the number of possible solutions increasing in factorial
scale in relation to the number of cities, this tendency of quadratic growth is good.
3.1.2. Assessing the Quality of the Solutions
For each of the 28 instances the algorithm was run 30 times. The results obtained are presented in
Table 4. For each instance the result is presented with the following attributes: 1) the cost of the
best solution found (Best); 2) the number of times the best known solution (BKS) was found; 3) the
average cost of the best solution found; 4) the percentage deviation of the best solution found for BKS
(PDb); 5) the percentage deviation of the mean solution for BKS (PDav); 6) the average number of
Table 4. Computational results of TSPoptBees
Instance
#
BKS
BKS
Best
att48
10628
10628
15
eil51
426
427
0
Average
PDb
PDav
Iterations
# bees
Time
10646,20
0,00
0,17
2159,50
180,40
49,45
427,37
0,23
0,32
1954,63
137,87
40,13
berlin52
7542
7542
29
7545,83
0,00
0,05
1486,20
161,33
33,56
st70
675
675
8
679,73
0,00
0,70
2270,53
186,27
78,49
eil76
538
538
12
541,20
0,00
0,59
3322,10
168,67
98,24
pr76
108159
108159
23
108368,90
0,00
0,19
2426,93
192,13
81,91
kroA100
21282
21282
16
21296,63
0,00
0,07
2061,30
219,27
86,89
kroB100
22141
22141
17
22176,93
0,00
0,16
2575,17
225,13
123,98
kroC100
20749
20749
16
20797,33
0,00
0,23
2706,03
223,67
132,27
kroD100
21294
21294
3
21456,37
0,00
0,76
2992,10
238,33
139,68
kroE100
22068
22106
0
22144,40
0,17
0,35
2627,53
226,60
122,91
rd100
7910
7910
3
7997,07
0,00
1,10
2557,03
214,13
132,98
eil101
629
629
8
630,93
0,00
0,31
2606,13
179,67
91,90
lin105
14379
14379
29
14380,23
0,00
0,01
2352,03
250,80
129,81
pr107
44303
44303
18
44345,60
0,00
0,10
1853,17
319,73
183,47
pr124
59030
59030
20
59107,07
0,00
0,13
2296,20
309,47
181,00
bier127
118282
118282
4
118650,20
0,00
0,31
2901,80
167,20
120,26
pr136
96772
96785
0
97671,70
0,01
0,93
4030,93
225,13
275,90
kroA150
26524
26583
0
26827,27
0,22
1,14
3586,33
243,47
291,64
kroB150
26130
26130
3
26237,90
0,00
0,41
4075,33
242,00
371,65
rat195
2323
2331
0
2352,53
0,34
1,27
3724,30
247,87
538,12
kroA200
29368
29368
1
29527,93
0,00
0,54
4063,77
282,33
564,80
kroB200
29437
29489
0
29771,77
0,18
1,14
4489,53
282,33
464,05
tsp225
3916
3916
1
3984,37
0,00
1,75
4743,40
254,47
711,53
a280
2579
2579
3
2613,50
0,00
1,34
4424,30
268,40
939,57
lin318
42029
42196
0
42589,33
0,40
1,33
6341,37
267,67
1428,00
fl417
11861
11902
0
12003,13
0,35
1,20
5907,13
492,80
3026,35
pr439
107217
107685
0
110352,80
0,44
2,92
6524,4
299,20
2651,69
48
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
iterations for the convergence; 7) the average number of bees in the final swarm; and 8) the average
time (Time), in seconds, of the algorithm execution.
Evaluating the best solution obtained for each of the instances in the 30 runs (Best and PDb in
Table 4), TSPoptBees was able to find the best-known solution at least once for 19 of the 28 instances
of TSP. Out of these 19 instances the TSP was found more than once for 17 of them. Among the
9 instances for which the BKS was not found the highest value for PDb was 0.44% for the pr439
instance. On average, the best solution found by TSPoptBees deviates by 0.08% from the BKS
(average value for PDb).
Regarding the average solution obtained during the 30 runs for each instance (PDav and PDb in
Table 4) TSPoptBees was not able to find the BKS in all runs for any instance. The largest deviation
of the mean solution from the BKS was 2.92% for the pr439 instance. The average solution obtained
by TSPoptBees for the set of instances used in this test had a deviation of 0.70% for the BKS (mean
value for PDb).
Table 5 presents a comparison of the quality of the solutions obtained by TSPoptBees with
those published in works that relate the TSP solution with algorithms inspired by bee behavior.
The algorithms listed in this table are: 1) BS + 2opt (Lucic & Teodorovic, 2003); 2) BCO (Wong
et al., 2008); 3) ABC-GSX (Banharnsakun et al., 2010); 4) ABC & PR (Zhang et al., 2011); 5)
Table 5. Comparison of the computational results of TSPoptBees with other bee-inspired algorithms from the literature
49
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
CABC (Karaboga & Gorkemli, 2011); 6) BCOPR (Girsang et al., 2012); and 7) ABC + 2opt
(Kiran et al., 2013).
3.2. Results for the Capacitated Vehicle Routing Problem
The following are the tests performed for the application of VRoptBees to the CVRP. The presentation
and discussion of the results are segmented according to the purpose of the test, as previously described.
3.2.1. Assessing the Computational Cost Based on the Number of Customers
The computational cost analysis of the implementation of VRoptBees for CVRP, named CVRPoptBees,
was done empirically when running the framework for two instances of the libraries selected for
this problem. The customers of the instances E-n101-k8 and vrpnc12 were divided into other 20
instances, 10 for each original instance, varying in size according to the set [41 46 51 56 61 66 71
81 91 101]. The reason for using two original instances and dividing them in relation to the number
of customers is the same as the one described for the TSP, since for CVRP each instance may have
different characteristics.
A specific property of CVRP instances, relative to TSP, is the ratio of total customer demand to
vehicle capacity. For this characteristic, values closer to 1 indicate greater utilization of vehicle capacity.
The higher this value, the smaller the number of feasible solutions within the search space, increasing
the complexity of the instance. Therefore, this is a value that was maintained during the execution
of the experiments in question, since the objective is to evaluate the behavior of the framework only
for the variation of the number of customers. With this variation, the total demand of the instance is
changed. To maintain the same ratio between total customer demand and vehicle capacity, vehicle
capacity is changed for each instance so as not to have an undesirable impact on the tests. For each
fraction of the original instance, the demand for each city was maintained and the capacity of the
vehicles was changed so that the value of the ratio was close to that of the original instance.
For each instance described above the framework was executed 30 times and only the
characteristics related to the computational effort of the algorithm were used in the analysis of the
results. These characteristics are: 1) the average number of iterations for convergence; 2) the mean
size of the final swarm; and 3) the average running time. Table 6 presents the results obtained for
the two instances used. For each instance, the average results of the number of iterations for the
Table 6. Result of the VRoptBees computational cost evaluation test for the CVRP. n is the number of customers plus the
depot; Iterations is the average number of iterations for convergence; nact is the average number of bees in the final swarm;
Time is the average time, in seconds, of running the framework
n
E-n101-k8
Iter.
nact
vrpnc12
Time
Iter.
nact
Time
41
2137,83
152,53
234,47
1990,23
122,47
181,63
46
2648,03
157,67
283,72
2524,67
128,33
243,82
51
2745,80
166,47
362,87
2633,30
129,80
274,31
56
3091,13
167,20
401,47
2966,00
148,13
369,78
61
2845,23
165,73
369,87
2736,40
160,60
364,29
66
3051,47
150,33
410,66
2749,03
147,40
406,38
71
3134,43
165,73
426,36
2806,33
154,73
392,35
81
3429,63
155,47
521,13
2685,17
146,67
377,70
91
3501,33
154,00
498,86
3165,30
161,33
418,34
101
3611,67
153,27
555,48
2880,37
170,13
477,11
50
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
convergence of the search process, the number of active bees in the final swarm and the execution
time of the framework are presented.
According to the results, there were different behaviors for the two instances in relation to the
number of bees active in the final swarm (nact in Table 6). For the E-n101-k8 instance there is a swarm
growth for 41 ≤ n ≤ 71, while for n > 71 the number of bees decreases. For the vrpnc12 instance
a behavior similar to the one described above occurred for n ≤ 81. For values above this threshold
the behavior was the opposite of that observed for the other instance, with the final swarm growing
with the growth in n.
For the number of iterations for convergence (Iterations in Table 6), the behavior is similar for
the two instances, with the number of iterations growing with the growth of n. The difference is in
the intensity of this growth, being more accentuated for the E-n101-k8 instance.
The runtime also has similar characteristics between the two instances and can be seen as a
composition of the two attributes described above. For the two instances the values are very similar,
between n = 41 and n = 66, showing a certain growth in the running time. From this point, there is
a more pronounced growth for the E-n101-k8 instance, which follows a similar behavior to that seen
for the number of iterations.
As observed for the TSP previously, for the CVRP the framework presents a similar behavior
with respect to the computational effort, regardless of the instance being solved. Figure 10 shows
the runtime trend for the two instances. It is observed that the running time can be approximated by
a third degree polynomial.
3.2.2. Evaluating the Quality of Solutions
To evaluate the quality of the CVRPoptBees solutions the framework was submitted to tests with 15
instances, varying in size between 21 and 199 customers, and 3 and 17 vehicles, as shown in Table 6.
For each of these instances the framework was executed 30 times. The results obtained are
presented in Table 7. For each instance the result is presented with the following attributes: 1) the
Figure 10. Running time of CVRPoptBees for two CVRP instances, (a) E-n101-k8 and (b) vrpnc12
51
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
cost of the best solution found; 2) the number of times the best known solution (BKS) was found; 3)
the average cost of the best solution found; 4) the percentage deviation of the best solution found for
BKS; 5) the percentage deviation of the mean solution for BKS; 6) the average number of iterations
for the convergence of the search process; 7) the average number of bees in the final swarm; And 8)
the average time, in seconds, of the framework execution.
Evaluating the best solution obtained for each instance among the 30 executions (Best and PDb
in Table 7) CVRPoptBees was able to find the best-known solution at least once for 4 of the 15
instances of the CVRP. Among the 11 instances for which the BKS was not found the highest value
for PDb was 3.49% for instance E-n76-k10. On average, the best solution found by CVRPoptBees
deviates by 1.28% for BKS (average value for PDb).
Regarding the average solution obtained during the 30 runs for each instance (Average and PDav
in Table 7), the largest deviation of the mean solution for the BKS was 6.82% for the E-n76-k10
instance. The average solution obtained by VRoptBees for the set of instances used in this test had a
deviation of 3.62% for the BKS (mean value for PDav).
Table 8 presents a comparison of the solutions obtained by CVRPoptBees with those of other
bee-inspired algorithms from the literature. The algorithms listed in this table are: 1) HBMOVRP
(Marinakis et al., 2008); and 2) ABC (Szeto et al., 2011). The HBMOVRP algorithm obtained the
BKS for 6 of the 7 instances used in the comparison, while the ABC achieved the BKS for 2, and
the CVRPoptBees for only 1. CVRPoptBees has obtained a better solution than ABC for only two
of the instances: vrpnc5 and vrpnc11.
Table 7. Results obtained for the implementation of CVRPoptBees. BKS is the best known solution for the instance; Best is
the cost of the best solution obtained by the framework; #BKS is the number of times the BKS was found; Average is the cost
of the average solution; PDb is the percentage deviation from the BKS; PDav is the percentage deviation of the mean solution
to the BKS; Iterations is the average number of iterations for convergence; #Bees is the average number of bees in the final
swarm; Time is the average time, in seconds, of running the framework. All results obtained after 30 executions per instance
Instance
BKS
Best
#BKS
Average
PDb
PDav
Iterations
# bees
Time
E-n22-k4
375
375
29
375,40
0,00
0,11
725,27
266.93
101,03
E-n23-k3
569
569
29
569,03
0,00
0,01
735,13
217.80
73,20
E-n30-k3
534
536
0
537,43
0,37
0,64
998,03
346.13
164,03
E-n33-k4
835
835
15
837,83
0,00
0,34
1077,37
184.07
104,89
E-n51-k5
521
526
0
547,30
0,96
5,05
2441,10
210.47
348,14
E-n76-k7
682
691
0
713,10
1,32
4,56
2883,70
176.73
384,40
E-n76-k8
735
754
0
777,40
2,59
5,77
3233,20
161.33
440,57
E-n76-k10
830
859
0
886,60
3,49
6,82
3510,53
139.33
496,86
E-n76-k14
1021
1040
0
1081,52
1,86
5,93
4081,20
154,00
654,91
E-n101-k8
815
825
0
851,57
1,23
4,49
3668,00
161.33
543,93
E-n101-k14
1067
1101
0
1131,23
3,19
6,02
4192,53
121,00
568,84
vrpnc4
1053
1079
0
1116,53
2,47
6,03
4122,57
137.13
641,19
vrpnc5
1373
1392
0
1432,23
1,38
4,31
5669,37
107,80
896,61
vrpnc11
1034
1038
0
1043,83
0,39
0,95
2846,57
217,80
487,69
vrpnc12
820
820
4
846,87
0,00
3,28
2698,33
167,93
383,27
52
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Table 8. Comparison of the solution obtained by CVRPoptBees with other bee-inspired algorithms from the literature
4. CONCLUSION
Vehicle routing problems are of great importance in scientific research, since they present a wide set
of practical applications and are difficult to solve in the optimality by exact methods, being one of
the most studied classes of combinatorial optimization problems. Finding optimal solutions to these
problems is unfeasible for large instances due to the factorial growth in the number of possible solutions.
Thus, optimal solution methods are usually replaced by heuristics capable of finding good solutions
in reasonable time. Among the many heuristics that have been applied to such problems, swarm-based
algorithms, including bee-inspired methods, have received a great deal of attention over the past decade.
Due to the relevance of vehicle routing problems and swarm intelligence algorithms, the main
contribution of this paper was to provide a framework, named VRoptBees, to design bee-inspired
algorithms for solving this class of problems. Among the eleven stages of the VRoptBees framework
between the modeling and the search process, eight variable parts were presented: 1) the representation
of the solution; 2) the cost function; 3) a set of route construction heuristics to create the initial swarm;
4) a function to calculate the distance between two bees; 5) a function to activate new bees; 6) a set
of recruitment operators; 7) a set of exploration operators; and 8) a set of local search algorithms.
To illustrate how to use and the framework performance, two implementations were proposed for
the variable parts to solve the following problems: 1) the traveling salesman problem (TSP); and 2) the
capacitated vehicle routing problem (CVRP). The algorithms were tested using several problem instances
from the literature and their results were compared to the Best Known Solutions and other bee-inspired
approaches. Both versions demonstrated to be competitive and, most importantly, the framework showed
to be generic enough to allow the design of various other combinatorial optimization solutions.
For virtually all bee-inspired algorithms, a number of issues opens avenues for future research
in the context of combinatorial optimization problems and others, such as: methods to automatically
define the input parameters, including self-adaptive approaches; the use of initialization, selection and
local search techniques specially designed for combinatorial optimization problems; the investigation
and improvements of convergence times and properties; empirical studies about the dynamics of
the swarm; the inclusion of diversity control mechanisms; a better understanding of the classes of
problems these algorithms perform better and why; and the parallelization of the methods. Last, but
not least, bee-inspired approaches have a great potential to be adapted and applied to various other
contexts, from pattern recognition to autonomous navigation.
ACKNOWLEDGMENT
The authors would like to thank Fapesp, CNPq, Capes and MackPesquisa for the financial support.
Special thanks goes to Intel who sponsor the Natural Computing and Machine Learning Laboratory
as a Center of Excellence in Machine Learning.
53
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
REFERENCES
Agarwal, D., Gupta, A., & Singh, P. K. (2016). A Systematic Review on Artificial Bee Colony Optimization
Technique. International Journal of Control Theory and Applications, 9(11), 5487–5500.
Angel, R. D., Caudle, W. L., Noonan, R., & Whinston, A. (1972). Computer-Assisted School Bus Scheduling.
Management Science, 18(6), 279–288. doi:10.1287/mnsc.18.6.B279
Banharnsakun, A., Achalakul, T., & Sirinaovakul, B. (2010). ABC-GSX: A hybrid method for solving the
Traveling Salesman Problem. In Second World Congress on Nature and Biologically Inspired Computing (NaBIC)
(pp. 7-12). doi:10.1109/NABIC.2010.5716308
Basturk, B., & Karaboga, D. (2006). An artificial bee colony (ABC) algorithm for numeric function optimization.
In IEEE Swarm Intelligence Symposium (pp. 687-697).
Bektas, T. (2006). The Multiple Traveling Salesman Problem: An Overview of Formulations and Solution
Procedures. Omega, 34(3), 209–219. doi:10.1016/j.omega.2004.10.004
Bitam, S., Batouche, M., & Talbi, E. G. (2010), A Survey on Bee Colony Algorithms. In 2010 IEEE International
Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
Blum, C., & Merkle, D. (2008). Swarm Intelligence: Introduction and Applications, Natural Computing Series.
Springer. doi:10.1007/978-3-540-74089-6
Bonabeau, E., Dorigo, M., & Theraulaz, G. (1999). Swarm Intelligence: From Natural to Artificial Systems.
Oxford University Press.
Christofides, N., & Eilon, S. (1969). An algorithm for the vehicle-dispatching problem. The Journal of the
Operational Research Society, 20(3), 309–318. doi:10.1057/jors.1969.75
Christofides, N., Mingozzi, A., & Toth, P. (1979). The vehicle routing problem.
Cunha, A. G., Takahashi, R., & Antunes, C. H. (2012). Evolutionary Computing and Metaheuristics Manual.
Coimbra: Imprensa da Universidade de Coimbra. doi:10.14195/978-989-26-0583-8
da Cunha, C. B. (2000). Practical Aspects of the Application of Vehicle Routing Models to Real World Problems.
Transportes, 8(2). (in Portuguese)
Davis, L. (1991). Handbook of Genetic Algorithms. Van Nostrand Reinhold.
de Castro, L. N. (2007). Fundamentals of Natural Computing: An Overview. Physics of Life Reviews, 4(1), 1–36.
doi:10.1016/j.plrev.2006.10.002
Dorigo, M., Birattari, M., & Stutzle, T. (2006). Ant colony optimization. IEEE Computational Intelligence
Magazine, 1(4), 28–39. doi:10.1109/MCI.2006.329691
Eberhart, R., & Shi, Y. (2001). Particle Swarm Optimization: Developments, Applications and Resources. In
Congress on Evolutionary Computation (pp. 81–86). IEEE.
Girsang, A. S., Tsai, C. W., & Yang, C. S. (2012). A Fast Bee Colony Optimization for Traveling Salesman
Problem. In 2012 Third International Conference on Innovations in Bio-Inspired Computing and Applications
(IBICA). doi:10.1109/IBICA.2012.44
Gutin, G., & Punnen, A. (2002). The Traveling Salesman Problem And Its Variations. Springer Science &
Business Media.
Johnson, D. S.; McGeoch. L. A. (1997), The Traveling Salesman Problem: A Case Study in Local Optimization.
In Local search in Combinatorial Optimization (Vol. 1, pp. 215-310).
Karaboga, D., & Akay, B. (2009). A Survey: Algorithms Simulating Bee Swarm Intelligence. Artificial Intelligence
Review, 31(1-4), 61–85. doi:10.1007/s10462-009-9127-4
Karaboga, D., & Gorkemli, B. (2011), A Combinatorial Artificial Bee Colony Algorithm for Traveling Salesman
Problem. In 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA)
(pp. 50-53). doi:10.1109/INISTA.2011.5946125
54
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Karaboga, D., Gorkemli, B., Ozturk, C., & Karaboga, N. (2014). A comprehensive survey: artificial bee colony
(ABC) algorithm and applications. Artificial Intelligence Review, 42(1), 21–57. doi:10.1007/s10462-012-9328-0
Kıran, M. S., İşcan, H., & Gündüz, M. (2013). The analysis of discrete artificial bee colony algorithm with
neighborhood operator on traveling salesman problem. Neural Computing & Applications, 23(1), 9–21.
doi:10.1007/s00521-011-0794-0
Kocer, H. E., & Akca, M. R. (2014). An improved artificial bee colony algorithm with local search for traveling
salesman problem. Cybernetics and Systems, 45(8), 635–649. doi:10.1080/01969722.2014.970396
Laporte, G. (1992). The traveling salesman problem: an overview of exact and approximate algorithms. European
Journal of Operational Research, 59(2), 231–247. doi:10.1016/0377-2217(92)90138-Y
Laporte, G. (2009). Fifty Years of Vehicle Routing. Transportation Science, 43(4), 408–416. doi:10.1287/
trsc.1090.0301
Laporte, G., Gendreau, M., Potvin, J.-Y., & Semet, F. (2000). Classical and modern heuristics for the vehicle routing
problem. International Transactions in Operational Research, 7(4-5), 285–300. doi:10.1111/j.1475-3995.2000.
tb00200.x
Lawler, E. (1985). The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. John Wiley
& Sons, Incorporated.
Lučić, P., & Teodorović, D. (2003). Computing With Bees: Attacking Complex Transportation Engineering
Problems. International Journal of Artificial Intelligence Tools, 12(03), 375–394. doi:10.1142/
S0218213003001289
Luke, S. (2013). Essentials of Metaheuristics (2nd ed.). Lulu.
Maia, R. D., de Castro, L. N., & Caminhas, W. M. (2012). Bee Colonies As Model For Multimodal Continuous
Optimization: The OptBees Algorithm. In IEEE Congress on Evolutionary Computation (CEC).
Maia, R. D., de Castro, L. N., & Caminhas, W. M. (2013). Collective Decision-Making by Bee Colonies as Model
for Optimization-The Optbees Algorithm. Applied Mathematical Sciences, 7(87), 4327–4351. doi:10.12988/
ams.2013.35271
Marinakis, Y., Marinaki, M., & Dounias, G. (2008), Honey Bees Mating Optimization Algorithm for the Vehicle
Routing Problem. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2007) (pp. 139-148).
doi:10.1007/978-3-540-78987-1_13
Marinakis, Y., Marinaki, M., & Dounias, G. (2011). Honey Bees Mating Optimization Algorithm for the Euclidean
Traveling Salesman Problem. Information Sciences, 181(20), 4684–4698. doi:10.1016/j.ins.2010.06.032
Marinakis, Y., Migdalas, A., & Pardalos, P. M. (2005). Expanding Neighborhood GRASP for the Traveling
Salesman Problem. Computational Optimization and Applications, 32(3), 231–257. doi:10.1007/s10589-0054798-5
Marinakis, Y., Migdalas, A., & Pardalos, P. M. (2009). Multiple phase neighborhood Search - GRASP based
on Lagrangean relaxation, random backtracking Lin–Kernighan and path relinking for the TSP. Journal of
Combinatorial Optimization, 17(2), 134–156. doi:10.1007/s10878-007-9104-2
Masutti, T. A. S., & de Castro, L. N. (2016a), TSPoptBees: A Bee-Inspired Algorithm to Solve the Traveling
Salesman Problem. In 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto
(pp. 593-598). doi:10.1109/IIAI-AAI.2016.148
Masutti, T. A. S., & de Castro, L. N. (2016b), Parameter Analysis of a Bee-Inspired Algorithm to Solve the
Travelling Salesman Problem. In Proc. Of the 15th IAESTED Conference, Intelligent Systems and Control (ISC
2016) (pp. 245-252).
Masutti, T. A. S., & de Castro, L. N. (2017). Bee-Inspired Algorithms Applied to Vehicle Routing Problems: A
Survey and a Proposal. Mathematical Problems in Engineering. doi:10.1155/2017/3046830
Michalewicz, Z., & Fogel, D. (2013). How to Solve It: Modern Heuristics. Springer Science & Business Media.
55
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Reinelt, G. (1991). TSPLIB—A Traveling Salesman Problem Library. ORSA Journal on Computing, 3(4),
376–384. doi:10.1287/ijoc.3.4.376
Ruiz-Vanoye, J. A., Díaz-Parra, O., Cocón, F., Soto, A., Arias, M. A. B., Verduzco-Reyes, G., & Alberto-Lira, R.
(2012). Meta-heuristics algorithms based on the grouping of animals by social behavior for the traveling salesman
problem. International Journal of Combinatorial Optimization Problems and Informatics, 3(3), 104–123.
Toth, P., & Vigo, D. (2001). The Vehicle Routing Problem. Society for Industrial and Applied Mathematics. SIAM.
Verma, B. K., & Kumar, D. (2013). A Review on Artificial Bee Colony Algorithm. IACSIT International Journal
of Engineering and Technology, 2(3), 175–186.
Wong, L. P., Low, M. Y. H., & Chong, C. S. (2008), A Bee Colony Optimization Algorithm for Traveling
Salesman Problem. In Second Asia International Conference on Modelling & Simulation (pp. 818-823).
doi:10.1109/AMS.2008.27
Yang, X.-S. (2010). Nature-Inspired Metaheuristic Algorithms. Luniver Press.
Zang, H., Zhang, S., & Hapeshi, K. (2010). A Review of Nature-Inspired Algorithms. Journal of Bionics
Engineering, 7(Suppl.), S232–S237. doi:10.1016/S1672-6529(09)60240-7
Zhang, X., Bai, Q., & Yun, X. (2011), A New Hybrid Artificial Bee Colony Algorithm for the Traveling
Salesman Problem. In IEEE 3rd International Conference on Communication Software and Networks (ICCSN)
(pp. 155-159).
Thiago A. S. Masutti holds a degree in Electrical Engineering - Computing Modality from the Catholic University
of Santos (2009) and an MSc in Computer Engineering from the Mackenzie University (2016). He has experience
in the area of Computer Science, with emphasis on Natural Computing, working mainly on the following subjects:
artificial immune systems, combinatorial optimization, and bio-inspired computing.
Leandro Nunes de Castro has a BSc in Electrical Engineering from the Federal University of Goias, an MSc in
Electrical Engineering from Unicamp, a PhD in Computer Engineering from Unicamp and an MBA in Strategic
Business Management from the Catholic University of Santos. He was a Research Associate at the University of
Kent at Canterbury from 2011 to 2002, a Research Fellow at Unicamp from 2002 to 2006, a Visiting Lecturer at
the University Technology Malaysia in September 2005. He is currently a Professor at the Graduate Program in
Electrical Engineering at the Mackenzie University, where he founded and lead the Natural Computing Laboratory
(LCoN). He is acknowledged by the Brazilian Research Council as a leading researcher in Computer Science,
and was recognized in 2011 as one of the most cited authors in Computer Science from the country. His main
research lines are Natural Computing and Data Mining, emphasizing Artificial Immune Systems, Neural Networks,
Swarm Intelligence, Evolutionary Algorithms and real-world applications. He was the founder Chief Editor of the
International Journal of Natural Computing Research and member of the committee of several conferences. He
has eight books published, from among four were written and the other three organized, and has published over
150 paper scientific papers.
56
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Analysis of Feature Selection
and Ensemble Classifier Methods
for Intrusion Detection
H.P. Vinutha, Bapuji Institute of Engineering and Technology, Davangere, Karnataka, India
Poornima Basavaraju, Bapuji Institute of Engineering and Technology, Davangere, Karnataka, India
ABSTRACT
Day by day network security is becoming more challenging task. Intrusion detection systems (IDSs)
are one of the methods used to monitor the network activities. Data mining algorithms play a major
role in the field of IDS. NSL-KDD’99 dataset is used to study the network traffic pattern which helps
us to identify possible attacks takes place on the network. The dataset contains 41 attributes and one
class attribute categorized as normal, DoS, Probe, R2L and U2R. In proposed methodology, it is
necessary to reduce the false positive rate and improve the detection rate by reducing the dimensionality
of the dataset, use of all 41 attributes in detection technology is not good practices. Four different
feature selection methods like Chi-Square, SU, Gain Ratio and Information Gain feature are used
to evaluate the attributes and unimportant features are removed to reduce the dimension of the data.
Ensemble classification techniques like Boosting, Bagging, Stacking and Voting are used to observe
the detection rate separately with three base algorithms called Decision stump, J48 and Random forest.
Keywords
Accuracy, Bagging, Boosting, Dimensionality Reduction, Dos, Ensemble Techniques, Feature Selection, Intrusion
Detection System, NSL-KDD Dataset, Probe, R2L, Stacking, U2R, Voting Detection Rate
INTRODUCTION
The antivirus software, message encryption, password protection, firewalls, secured network protocols,
etc., are not enough to provide security in network. Intruders may attack the network using many
unpredictable methods. Monitoring the activities and actions on network systems or a computer
network using Intrusion Detection Systems (IDSs) helps to analyze the possible incidents. In 1980
the first IDS was launched by James P Anderson and later in 1987 D Denning enhanced it. IDS play
a major role in data integrity, confidentiality and availability of the network. Some of the attacks
included in the intrusion are confidential and sensitive information malign, available resources and
functionalities may be hacked. Means and modality logs of various attacks are easily produced by IDS.
This helps to prevent possible attacks in future. Current day’s organizations are provided with good
source of security analysis by using IDS. IDS divided into two categories based on their architecture
DOI: 10.4018/IJNCR.2018010104

Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

57
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
and functionality. The categories of IDS are host-based intrusion detection system (HIDS) and
network-based intrusion detection system (NIDS). HIDS are running on individual host machine and
NIDS are within the network to monitor to and from traffic on the network. This proposed work we
have concentrated on NIDS in order to monitor the network traffic to identify the incoming traffic is
normal or anomaly. We have two important modes and methods to operate IDS for packet analysis
on the network they are anomaly detection and misuse detection methods.
Anomaly Detection
Anomaly detection is a method of scanning for the abnormal activities that encounter on the network.
It maintains the log of activities that takes place on the network and such information can be used
for the comparison of all the activity which takes place on the network. Using these information
new rules can be defined for the kind of new activity, if any deviations takes place from the normal
activity can be referred as an anomaly. Some of the common examples for rule-based methods are
Minnesota Intrusion Detection System (MINDS), Intrusion Detection Expert System(IDES), Next
Generation Intrusion Detection Expert System (NGIDES), etc.
Misuse Detection
Another method of IDS is misuse detection. In this method the network activities are
compared with pre-defined signatures. Signatures are some set of characteristic features that
gives specific patterns of attack. These set of characteristics features are stored to compare
with the network activity, if any pattern is different from the pre-defined patterns can be
considered as attacks. Some of the common signature-based tools used are Snort, Suricata,
KISMAT, HONEYD, BRO Ids etc.
Current IDSs are encountered with many drawbacks some of the major drawbacks are
identified as false positive and false negatives. False positive occurs when normal is considered
as malicious attack. In false negative encounters when actual attack take place. Data mining is
one of the field contributed many techniques for IDSs. Techniques like data summarization,
visualization, clustering, classification etc helps to accomplish the task. Along with these
drawbacks IDSs is facing major drawback in Big Data because in this huge volume of data
has to be managed.
Detection Issues
There are four main type of detection issues in IDS and these issues rises depending on the type of
alarm in the intrusion scenario. IDSs encounter the fallowing type of detection issues:
•
•
•
•
True Positive: IDS response to raise alarm when actual attack occurs.
True Negative: IDS does not response to raise an alarm when no attack occurs.
False Positive: IDS generate an alarm when no attack takes place.
False Negative: IDS does not generate an alarm when actual attack takes place.
Intrusion detection system is becoming very challenging task in current days. The dataset used for
IDS will be huge in number and contains many irrelevant features and sometimes redundant relevant
features. In stage of detection of intrusion if our system makes use of all the features in the dataset,
analysis of intrusion becomes very difficult. Because if dataset contains large number of features
then this makes it difficult to identify the suspicious behaviors. In such cases it is going to reduce the
detection performance and efficiency of the model. So, it is necessary to reduce the dimension of the
dataset before applying the data mining approaches such as classification, clustering, association rule
and regression on the dataset. Feature selection methods are used as pre-processing step to reduce
the dimensionality of dataset.
58
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
DATASET
Dataset plays a major role to verify the effectiveness of the IDS either by using real time dataset
and benchmark dataset. Benchmark dataset is a simulated dataset designed and developed by MIT
Lincoln labs for the purpose of research. Most of the research work is carried out using benchmark
dataset. In 1999 DARPA 98 is developed by summarizing the network connection with 41-features
per connection. Later to get the competence with international knowledge discovery and data mining
tool it is formed as KDD99 benchmark dataset. The major drawback of KDD99dataset is it contains
many redundant and incomplete data. To overcome the drawbacks of KDD99 dataset NSL-KDD 99
dataset is designed. The dataset contains 42 features including class type called normal or anomalous.
These 42 attributes of each network connection vector are labeled as one of the following categories:
Basic, Traffic, Content, Host. Attributes have their own attribute value type called nominal, binary
and numeric types. Table 1 (Preethi, 2016; Deepak, 2016) gives the detailed attribute list. The
number of records in the NSL-KDD dataset is recorded as training and testing dataset with the type
of normal and attacks. The actual attacks are 37 but among those 21 are considered for training
dataset and reaming 16 are considered for testing dataset along with training attacks. Table 2 gives
the details of attack categories. These attacks are divided into four main categories such as Denial of
Service attack (DOS), Probing attack (probe), Remote to Local (R2L) and User to Root (U2R). Data
set contains 12579 instances and some tine it is necessary to reduce the size of the dataset for faster
computation. This can be achieved by transforming the some of the values from nominal to numeric
using transformation table. Table 3 gives transformation table.
RELATED WORK
In papers (Umamaheswari, 2016; Kaur, 2016; Osama, 2016) they have concentrated on AdaBoost,
Bagging and Stacking which are ensemble methods. Different machine learning algorithms are
combined with ensemble so that better results are obtained with these combinations. KDDCup’99
dataset is used to conduct experiment and preprocessing is done with feature selection methods for
better result.
Table 1. Dataset description
59
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Table 2. Attack categories
Table 3. Transformation table
(Preethi, 2016; Deepak, 2016) In this work authors have given complete detail about the NSLKDD’99 dataset. The dataset is divided into four parts like Host, Content, Traffic and Basic. They
have discussed the contribution of each category in reducing the false alarm and improving the
detection rate.
(Visalakshi, 2015; Radha, 2015) In this paper we studied that the feature selection plays a major
role in classification and detection techniques. If number of features are more the dimensionality of
the data is also large, feature selection is a common technique used with all high input data to select
the subset of features. They have used wrapper method to select the subset of features from drinking
water dataset.
(Srivastava, 2014) In this paper author has analyzed very popular basic principal of data mining.
Large amounts of data are available and it has to be facilities into useful information in every field.
Data mining tool called WEKA is used to preprocess the data, irreverent features are removed using
different feature selection algorithm, classification and evaluation is done using different classifiers.
Finally, it is concluded that WEKA API can be changed to do many improvements.
60
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
In the papers (Vanaja, 2014; Kumar, 2014) according to authors have mentioned various feature
selection algorithms used on dataset in order to select the features used to classify the classifiers more
accurately. Different Feature selection methods are used such as Filter method, Wrapper method and
Embedded method. Correlation feature subset, Gain Ratio, Genetic algorithm Hill climbing Info Gain,
etc., feature selection methods are discussed in this article. These methods are used with different
search techniques to identify which are the relevant feature and which are irrelevant features. Results
are observed to find good ranked features so that it can be used to improve the performance and
accuracy when any classifiers are applied on it.
In this (Patel, 2014; Tiwari, 2014) ensemble techniques are used in intrusion detection
system. Different machine learning algorithms are combined to build ensemble techniques.
Experimented is conducted on each class of attacks separately with normal category. They have
applied bagging ensemble method and proved that combination of methods gives better result
than individual algorithms.
FEATURE SELECTION
Feature selection is very important step to build an Intrusion detection model. The dataset contains
huge number of attributes or features, but in most of the cases all the features present in the dataset
in least use or no use. With this huge number of attributes present in the dataset leads to reduce in
performance and detection rate because it takes maximum time to process the data itself. In such
cases reduction in features is required and selection of good features is also important. In order to
achieve this making use of data mining techniques to pre-processing the data is very important. Feature
selection or attributes selection methods have three types like Filter method, Wrapper method and
Embedded method. Filter and Wrapper methods are most commonly used to select the attributes.
Filter method is not dependent on any type of classifiers, but Wrapper methods depend on classifiers
(Sheena, 2010; Kumar, 2010). Wrapper method is combined with induction algorithms so it more
time consuming than filter method.
Machine learning works on a simple rule means if you input more noisy data it gives noisy output.
This rule is very much important when number of feature are more in dataset. Every feature is not
necessary to use in machine learning so, to select important features some of the feature selection
algorithms are performed and least ranked features are removed. This dimensionally reduced dataset
can be used to analyze the classification algorithms.
In our work we have concentrated on five feature selection algorithms like Gain Ratio, Chisquared, Information Gain and Symmetrical Uncertainty (SU). The Rankers method is used as search
techniques to rank the features. The ranking indicates which are best and which are not much required
features for further process (Sheena, 2010; Kumar,2010).
Chi-Squared (X2 statistic)
Chi-square is filtere based feature selection algorithm and it gives the need of independences between
class and term for each degree of distribution freedom. The expression is as Equation (1). Where D
indicates the total number of documents, E is the expected outcomes P means the document numbers
in class C containing term t, Q is the document number containing t occurs without C, M means the
document number in class C occurs without t, N indicates document numbers of others class without
t (Sakar, 2013; Goswami, 2013):
D * (PE - M Q )
2
X
2
,)
(tc
=
(P + M )*(Q + N )*(P + Q )*(M + N )
.
(1)
61
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Symmetrical Uncertainty (SU)
In this algorithm by measuring the correlation between features and target classes set of attributes are
calculated which is shown in Equation (2). Where H(X) and H(Y) means entropies which based on
the probability and associated with each feature and value of its class respectively and H(X,Y) is the
joint probability values of X and Y for all combinations (Karimi, 2013; Mansour, 2013; Ali,2013):
SU =
H (X )+ H (Y )- H (X /Y )
H (X )+ H (Y
)
(2)
Gain Ratio Attribute Evaluation
In Gain Ratio evaluation method evaluation is measured by considering the gain value with respect
to the split information. Equation as shown in 3 (Aggarwal, 2013; Amritha, 2013):
Gain Ratio (A) = Gain (A)/Split Info (A)
(3)
Information Gain
In information gain it scored based on how maximum information obtained about classes for each
feature. The equation is given in Equation (4). Where H(Y) and H(Y | X) equal to the entropy of Y
and the conditional entropy of Y for give X respectively (Karimi, 2013; Mansour,2013):
IG (X )= H (Y )- H (Y |X )
(4)
ENSEMBLE TECHNIQUES
Performance of classifier are improved by combining the single classifiers such as Ensemble
classifiers. Ensemble techniques uses divide and conquer approaches which is more effective
and efficient in nature Iwan, 2012; Ed, 2012; Adam, 2012; Gray, 2012). In divide and conquer
method many complex problems are divided into small number of sub-problems so that it
becomes easy to analyze and solve. The main advantage of such approaches is that we can get
more accuracy in result compared to single algorithms. In Ensemble techniques one base model
is used to classify the data and output of that base model is given for further classification. In
our proposed approach Decision stump, J48, Random forest are used as base algorithms. In this
paper we have evaluated ensemble classifier technique called Boosting, Bagging, Stacking and
Voting (Syaif, 2012; Zaluska, 2012; Wills, 2012).
Boosting Ensemble Method
Boosting is a general gathering technique that makes a solid classifier from various frail
classifiers. This is finished by building a model from the preparation information, at that point
making a moment demonstrate that endeavors to redress the mistakes from the principal display.
Models are included until the point that the preparation set is anticipated superbly or a most
extreme number of models are included. AdaBoost was the principal truly effective boosting
calculation created for twofold grouping. It is the best beginning stage for understanding
boosting. Current boosting techniques expand on AdaBoost, most prominently stochastic
angle boosting machines.
For picking the correct appropriation, here are the accompanying advances:
62
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Stage 1: The base student takes every one of the circulations and relegate measure up to weight or
consideration regarding every perception.
Stage 2: If there is any forecast mistake caused by a respectable starting point learning calculation,
at that point we give careful consideration to perceptions having expectation blunder. At that
point, we apply the following base learning calculation.
Stage 3: Iterate Step 2 till the farthest point of base learning calculation is come to or higher exactness
is accomplished.
At long last, it joins the yields from feeble student and makes a solid student which in the long
run enhances the expectation energy of the model. Boosting pays higher concentrate on cases which
are mis-classified or have higher blunders by going before feeble guidelines.
Sorts of Boosting Algorithms
Basic motor utilized for boosting calculations can be anything. It can be choice stamp, edge amplifying
order calculation and so forth. There are many boosting calculations which utilize different sorts of
motor, for example:
1. AdaBoost (Adaptive Boosting)
2. Gradient Tree Boosting
3. XGBoost
The algorithm for AdaBoost is as follows in Algorithm 1.
Bagging
Bagging (Breiman, 1996), a name got from“bootstrap aggregation”, was the primary viable strategy for
ensemble learning and is one of the least difficult strategies for angling. The meta-calculation, which
is an exceptional instance of the model averaging, was initially intended for characterization and is
generally connected to choice tree models, however it can be utilized with a model for classification or
regression. The technique utilizes various renditions of a preparation set by utilizing the bootstrap, i.e.
Algorithm 1. AdaBoost
63
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
inspecting with substitution. Each of these informational collections is utilized to prepare an alternate
model. The yields of the models are joined by averaging (in case of regression) or voting (in the event
of classification) to make a solitary yield. Bagging is just successful when utilizing unsteady (i.e. a
little change in the preparation set can cause a noteworthy change in the model) nonlinear models.
Algorithm 2 shows the Bagging calculation.
Stacking
Stacking works similar as boosting: It additionally adds a few models to get unique information.
Stacking (some of the time called stacked speculation) includes preparing a learning calculation to
consolidate the forecasts of a few other learning calculations. In the first place, the majority of alternate
calculations are prepared utilizing the accessible information, at that point a combiner calculation is
prepared to make a last forecast utilizing every one of the expectations of alternate calculations as extra
sources of info. On the off chance that a self-assertive combiner calculation is utilized, at that point
stacking can hypothetically speak to any of the outfit procedures portrayed in this article, in spite of
the fact that by and by, a solitary layer strategic relapse display is frequently utilized as the combiner.
Stacking commonly yields execution superior to any single one of the prepared models. It has
been effectively utilized on both administered learning errands (regression, grouping and separation
learning) and unsupervised learning (thickness estimation). It has likewise been utilized to appraise
packing’s mistake rate. It has been accounted for to out-perform Bayesian model-averaging. The two
best entertainers in the Netflix rivalry used mixing, which might be thought to be a type of stacking.
Algorithm 3 shows the stacking calculation.
Voting
Voting and averaging are two of the most effortless outfit strategies. They are both straightforward and
actualize. Voting is utilized for grouping and averaging is utilized for relapse. In the two techniques, the
initial step is to make different arrangement/relapse models utilizing some preparation dataset. Each
base model can be made utilizing diverse parts of a similar preparing dataset and same calculation,
or utilizing the same dataset with various calculations, or some other technique.
Majority Voting
Each model makes an expectation (votes) for each test occasion and the last yield forecast is the one
that gets the greater part of the votes. On the off chance that none of the expectations get the greater
part of the votes, we may state that the group strategy couldn’t make a steady forecast for this occasion.
In spite of the fact that this is a generally utilized strategy, you may attempt the most voted expectation
(regardless of the possibility that that is not as much as half of the votes) as the last forecast.
Algorithm 2. Bagging
64
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Algorithm 3. Stacking
Weighted Voting
In weighted voting tally, the expectation of the better models various circumstances. Finding a sensible
arrangement of weights is dependent upon model.
Straightforward Averaging
In straightforward averaging technique, for each occurrence of test dataset, the normal expectations
are computed. This technique regularly lessens overfit and makes a smoother relapse display.
Figure 1 gives the proposed method in this method the first and important step is to
pre-process the data. Because obtained dataset may contain many incomplete, meaningless,
inconsistence and noisy data. At the stage of pre-processing in NSL-KDD dataset the nonnumeric
features are to be converted into numeric feature by preparing a transformation table as our
convenience. The transformation table used in this is given in Table3. Attack names are with
sub-attack names such names have to be assigned with its main categories like normal, DoS,
Probe, R2L and U2R. Attack category table is given in Table 2. In NSL-KDD dataset features
may contain discrete or continuous values these features are normalized with min-max ranges.
After preprocessing we can extract the necessary features by applying different feature extraction
methods. In every feature extraction methods all 41 attributes are ranked using ranker’s algorithm
along with feature selection method. Observe the rank matrix of all the attributes and select the
highest ranked attributes and remove the least ranked attributes known as irrelevant attributes.
Dimensionality is reduced by removing the irrelevant features. Then ensemble approaches are
applied for different algorithm as a base algorithm.
EXPERIMENTAL STUDY
To select the best subset of features experiment is conducted on various feature selection approaches.
The experiment is performed and analyzed using data mining tool called WEKA 3.6 and the input
dataset for this tool should be saved in ARRF file format. All 41 features are used to select the best
65
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 1. Proposed methodology
subset of features. Every approach is used with best searching technique called rankers method.
Results are shown in separate graph.
Results are obtained for chi-square feature selection method, all the feature are evaluated and
ranked from most important to least important. In this, 18 attributes are ranked very less, they are 1,
2,7,8,11,13,14,15,16,17,18,19,20,21,22,27,28,41 (see Figure 2).
SU method is used with ranker algorithms to evaluate the features from highest to least and
results are shown in above graph. The attributes numbered 1,7,8,9,10,11,13,14,15,16,17,18,19,20,2
1,22,23 are very least ranked attributes.
Every feature is ranked using a rankers method, ranks are assigned to all the attributes when
gain ratio method is applied on the dataset. As referred from Figure 4 attributes numbered with 1,7
,13,15,16,19,20,21,24,28,32,36,40,41
Figure 2. Chi-Square attributes evaluation
66
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 3. Symmetrical Uncertainty (SU) attributes evaluation
Figure 4. Gain Ratio attributes evaluation
All 41 features are ranked from highest to least for information gain using ranker’s algorithm.
Graph is shown in Figure 5. As per the observation 1,2,7,8,9,10,11,13,14,15,16,17,18,19,20,21,22,
27,28,40,41 are least ranked.
As we have observer from the above graphs the feature named as 1,7,13,15,16,19,20,21 named
Duration, Land, Num_compramised, SU_attempt, Num_root, Num_accept_file, Num_outbound_cmds,
Is_hot_login respectively are least ranked in all the cases. The feature numbered 8,9,11,14,17,18,22,41
named as wrong_fragment, urgent, Num_failed_logins, Root_shell, Num_file_creations, Num_shell,
Is_guest_login, Dst_host_srv_error_rate, respectively, are less ranked in maximum three cases. So
finally, all 16 features are considered to remove to reduce the dimensionality of the dataset.
Experiment is conducted on NSL-KDD dataset, which contains 125973 number of instances and
42 number of attributes including one class attribute. These 42 attributes are reduced by removing
16 unimportant attributes. After this dataset is reduced with 26 number of attributes for the same
number of instances. This dimensionally reduced dataset is used to evaluate the ensemble techniques.
Ensemble methods like AdaBoost, Bagging, Stacking and Voting are used with three separate base
classifiers. Decision Stump, J48 and Random Forest are the methods used as base classifiers. Every
ensemble classifier is evaluated with these three base classifiers to evaluate Detection Rate(DR),
False Positive Rate(FPR) and Accuracy(ACR). Time taken to build a model is also noted to prove that
taking less time to build a model does not improve the efficiency of the algorithm. These evaluation
metrics are calculated by using fallowing equations (Aldhyani, 2014; Joshi, 2014):
67
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 5. Information Gain attributes evaluation
DR =
T otalnum berofdetectionattacks
* 100%
T otalnum berofattacks
FPR =
T otalnum berofm isclassified processes
* 100%
T otalnum berofnorm alprocesses
ACR =
T otalnum berofclassified processes
* 100%
T otalnum berofnorm alprocesses
Table 4 gives the detailed analysis of time taken to build a model, DR, FPR and ACR. Every
ensemble classifier and base classifiers graphs are shown for the detailed analysis. Graphs are shown
Table 4. Results of Ensemble Method for different base classifiers
Ensemble
Method
AdaBoost
Bagging
Stacking
68
Base Classifier
Time taken to build
a model in sec
Detection Rate
in %
False Positive
Rate in %
Accuracy
in %
Decision Stump
7.25
83.20
12.00
87.37
J48
116.2
99.88
1.14
98.87
Random Forest
700.47
99.89
1.07
99.00
Decision Stump
13.68
83.15
16.84
83.15
J48
118.59
99.81
0.19
99.81
Random Forest
85.21
99.89
0.11
99.89
Decision Stump
0.92
53.45
46.54
53.45
J48
0.69
53.45
46.54
53.45
Random Forest
2.26
53.45
46.54
53.45
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
in Figures 6, 7, and 8. Bagging and Boosting with base algorithm J48 and Random Forest gives
more than an average 99% and compare to use of J48 as a single algorithm it is an average of 94%
(Tigabu,2012). As voting is a straight forward algorithm it does not include any algorithm as a base
algorithm. It just gives the value in single run with very less time and detection rate of the method
is very less.
CONCLUSION
In this work we have focused on complete study of NSL-KDD’99 dataset and dimensionality reduction
by using Chi-Square, Information Gain, Gain ratio and Symmetrical Uncertanity feature selection
Figure 6. AdaBoost Classifier
Figure 7. Bagging Classifier
69
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Figure 8. Stacking Classifier
methods. By observing the output of each method some of the irrelevant attributes are removed from
the list. This reduced feature is used to observe the detection rate and false positive rate of ensemble
method. As a result, it is shown that Ensemble methods are good compared to single classifiers. Graphs
are plotted to show the importance of every attributes so that less important features are removed.
Ensemble method is used to show that Boosting and Bagging are the most important algorithms
compared to Stacking and Voting. It is also proved that taking less time to build a model does not
improve the detection rate, false positive rate and accuracy.
70
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
REFERENCES
Aggarwal, M. (2013, June). Performance analysis of different feature selection method in intrusion detection.
International Journal of Scientific & Technology Research, 2(6).
Aggarwal, P., & Dahiya, D. (2016). Contribution of four labeled attribute of KDD dataset on detection and
false alarm rate for intrusion detection system. Indian Journal of Science and Technology, 9(5). doi:10.17485/
ijst/2016/v9i5/83656
Akal, T.D. (2012). Constructing Predictive Model for Network Intrusion Detection [Thesis].
Al Mehedi Hasan, M. (2016). Feature Selection for Intrusion Detection using Random forest, Bangladech.
Intenation Information Security, 7(03), 129–140. doi:10.4236/jis.2016.73009
Aldhyani, T. H., & Joshi, M. R. (2014). Analysis of Dimensionality Reduction in Intrusion Detection. International
Journal of Computation Intelligence and Informatics, 4(3).
Bajaj, K., & Arora, A. (2013). Improving the intrusion detection using discriminative machine learning approach
and improve the time complexity by data mining feature selection method. International Journal of Computers
and Applications, 76(1).
Bisht, N., Ahmad, A., & Pant, A. K. (2017), Analysis of classification ensembles for network intrusion detection
systems. Communication of Applied Electronics, 6(7).
Chae, H. S., Jo, B. O., Choi, S. H., & Park, T. K. (2013). Feature selection for intrusion detection using nsl-kdd.
In Recent advances in computer science (pp. 184-187).
Dewa, Z., & Maglaras, L. A. (2016). Data mining and intrusion detection systems. International Journal of
Advanced Computer Science and Applications, 7(1).
Dongre, S. S., & Wankhade, K. K. (2012). Intrusion detection system using new ensemble boosting approach.
International Journal of Modeling and Optimization, 2(4).
Folino, G., Pizzuti, C., & Spezzano, G. (2010). An ensemble-based evolutionary framework for coping with
distributed intrusion detection. Rende, Italy: Institute for High Performance Computing and Networking.
Gaikwad, N. Z. D. P. (2015). Comparison of stacking and boost svm method for KDD Dataset. International
Journal of Engineering Research and General Science, 3(3).
Hu, W., Gao, J., Wang, Y., Wu, O., & Maybank, S. (2014). Online adaboost- based parameterized methods for
dynamic distributed network intrusion detection. IEEE Transactions on Cybernetics, 44(1). PMID:23757534
Joshi, S. A., & Pimprale, V. S. (2013, January). Network Intrusion Detection System(NIDS) based on Data
Mining. International Journal of Engineering Science and Innovative Technology, 2(1).
Karimi, Z., Kashani, M. M. R., & Harounabadi, A. (2013). Feature Ranking in Intrusion Detection Dataset using
Combination of Filtering Methods. International Journal of Computers and Applications.
Kumar, K., Kumar, G., & Kumar, Y. (2013). Feature selection approach for intrusion detection system.
International Journal of Advanced trends in Computer Science & Engineering, 2(5).
Lappas, T., & Pelechrinis, K. (2007). Data mining techniques for (network) intrusion detection systems. UC
Riverside.
Patel, A., & Tiwari, R. (2014). Bagging ensemble technique for intrusion detection system. International Journal
for Technological Research In Engineering, 2(4).
Riyad, A. M., & Irfan Ahmed, M. S. (2013). An Ensemble Classification Approach for intrusion Detection,
Coimbatore, India. International Journal of Computers and Applications.
Sadek, R. A., Soliman, M. S., & Elsayed, H. S. (2013). Effective anomaly intrusion detection system based
on neural network with indicator variable and rough set reduction. International Journal of Computer Science
Issues, 10(6).
71
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018
Sannady, V., & Gupta, P. (2016). Intrusion Detection Model in Data Mining Based on Ensemble Approach.
Journal of Engineering Technology, 3(3).
Sivaranjani, S., & Pathak, M. R. (2014). Network intrusion detection using data mining technique. International
Journal of Advanced Research in Computer Engineering & Technology, 3(6).
Srivastava, S. (2014, February). Weka: A tool for Data preprocessing, classification, Ensemble, Clustering and
Association Rule Mining. International Journal of Computers and Applications.
Sutha, K., & Tamilselvi, J. J. (2015). A Review of Feature Selection Algorithms for Data Mining Techniques.
International Journal on Computer Science and Engineering, 7(6).
Syarif, I., Zaluska, E., Prugel-Bennett, A., & Wills, G. (2012, July). Application of bagging, boosting and
stacking to intrusion detection. In International Workshop on Machine Learning and Data Mining in Pattern
Recognition (pp. 593-602). Springer.
Visalakshi, S. (2015). Wrapper based feature selection and classification for real time dataset. International
Journal of Emerging Technologies in Computational and Applied Sciences.
Youssef, A., & Emam, A. (2011). Network intrusion detection using data mining and network behaviour analysis.
International Journal of Computer Science & Information Technology, 3(6).
H.P. Vinutha is working as an Assistant Professor in the Department of Computer Science and Engineering at
Bapuji Institute of Engineering and Technology, Davangere, Karnataka, India. She received her MTech in Computer
Science & Engineering from Kuvempu University, Davangere, Karnataka, India. Completed her BE in Electrical
and Electronics Engineering from Kuvempu University, Chikmangalore, Karnataka, India. She is perusing her PhD
in V.T.U., Belgavi, Karnataka, India.
Poornima Basavaraju is a professor and Head of the Department of Information Science and Engineering at Bapuji
Institute of Engineering and Technology, Davangere, Karnataka, India. She has received her MTech from V.T.U
Belagavi, and PhD from Kuvempu University, India. Her research interests are in the areas of Data Mining, Big
Data Analytics, Fuzzy Systems and Image Processing. She was the principle investigator of AICTE funded RPS
project “Design and Analysis of Efficient Privacy Preserving Data Mining Algorithms.”
72
International Journal of Natural Computing Research
Volume 7 • Issue 1 • January-March 2018 • ISSN: 1947-928X • eISSN: 1947-9298
An official publication of the Information Resources Management Association
Mission
The mission of the International Journal of Natural Computing Research (IJNCR) is to serve as a worldleading forum for the publication of scientific and technological papers involving all main areas of natural
computing, namely, nature-inspired computing, methods to computationally synthesize natural phenomena,
and novel computing paradigms based on natural materials. This journal publishes original material, theoretical
and experimental applications, and review papers on the process of extracting ideas from nature to develop
computational systems or materials to perform computation.
Subscription Information
The International Journal of Natural Computing Research (IJNCR) is available in print and electronic formats
and offers individual or institution-level pricing. Full subscription information can be found at www.igi-global.
com/IJNCR.
IJNCR is also included in IGI Global’s InfoSci-Journals Database which contains all of IGI Global’s peerreviewed journals and offers unlimited simultaneous access, full-text PDF and XML viewing, with no DRM.
Subscriptions to the InfoSci-Journals Database are available for institutions. For more information, please visit
www.igi-global.com/infosci-journals or contact E-Resources at eresources@igi-global.com.
Correspondence and Questions
Editorial
Nilanjan Dey, Editor-in-Chief • IJNCR@igi-global.com
Subscriber Info
IGI Global • Customer Service
701 East Chocolate Avenue • Hershey PA 17033-1240, USA
Telephone: 717/533-8845 x100 • E-Mail: cust@igi-global.com
The International Journal of Natural Computing Research is indexed or listed in the following.
ACM Digital Library; Bacon’s Media Directory; Cabell’s Directories; DBLP; Google Scholar; INSPEC; JournalTOCs;
MediaFinder; The Standard Periodical Directory; Ulrich’s Periodicals Directory
Download