Uploaded by Dr.Anitha gopalakrishna

A mining based approach for selecting be

advertisement
First Edition 2008
© PUTEH SAAD & HISHAMMUDDIN ASMUNI 2008
Hak cipta terpelihara. Tiada dibenarkan mengeluar ulang mana-mana bahagian artikel,
ilustrasi, dan isi kandungan buku ini dalam apa juga bentuk dan cara apa jua sama ada
dengan cara elektronik, fotokopi, mekanik, atau cara lain sebelum mendapat izin bertulis
daripada Timbalan Naib Canselor (Penyelidikan dan Inovasi), Universiti Teknologi
Malaysia, 81310 Skudai, Johor Darul Ta’zim, Malaysia. Perundingan tertakluk kepada
perkiraan royalti atau honorarium.
All rights reserved. No part of this publication may be reproduced or transmitted in any
form or by any means, electronic or mechanical including photocopy, recording, or any
information storage and retrieval system, without permission in writing from Universiti
Teknologi Malaysia, 81310 Skudai, Johor Darul Ta’zim, Malaysia.
Perpustakaan Negara Malaysia
Cataloguing-in-Publication Data
Advances in artificial intelligence applications / chief editor: Puteh
Saad ; editor Hishammuddin Asmuni.
ISBN 978-983-52-0623-8
1. Artificial intelligence. I. Puteh Saad. II. Hishammuddin Asmuni.
006.3
Editor: Puteh Saad & Rakan
Pereka Kulit: Mohd Nazir Md. Basri & Mohd Asmawidin Bidin
Diatur huruf oleh / Typeset by
Fakulti Sains Komputer & Sistem Maklumat
Diterbitkan di Malaysia oleh / Published in Malaysia by
PENERBIT
UNIVERSITI TEKNOLOGI MALAYSIA
34 – 38, Jln. Kebudayaan 1,Taman Universiti
81300 Skudai,
Johor Darul Ta’zim, MALAYSIA.
(PENERBIT UTM anggota PERSATUAN PENERBIT BUKU MALAYSIA/
MALAYSIAN BOOK PUBLISHERS ASSOCIATION dengan no. keahlian 9101)
Dicetak di Malaysia oleh / Printed in Malaysia by
UNIVISION PRESS SDN. BHD
Lot. 47 & 48, Jalan SR 1/9, Seksyen 9,
Jalan Serdang Raya, Taman Serdang Raya,
43300 Seri Kembangan,
Selangor Darul Ehsan, MALAYSIA.
CONTENTS
PREFACE............... ……………………………………………..…VII
CHAPTER 1 ......................................................................................... 1
NEURAL NETWORK FOR CLASSIFYING STUDENT
LEARNING CHARACTERISTICS IN E-LEARNING
NOR BAHIAH HJ AHMAD
SITI MARIYAM HJ SHAMSUDDIN
CHAPTER 2 ....................................................................................... 20
A HYBRID ARIMA AND NEURAL NETWORK FOR YIELDS
PREDICTION
RUHAIDAH SAMSUDIN
ANI SHABRI
CHAPTER 3 ....................................................................................... 36
A PERFORMANCE STUDY OF ENHANCED BP
ALGORITHMS ON AIRCRAFT IMAGE CLASSIFICATION
PUTEH SAAD
NURSAFAWATI MAHSHOS
SUBARIAH IBRAHIM
RUSNI DARIUS
CHAPTER 4 ....................................................................................... 63
ANFIS FOR RICE YIELDS FORECASTIN
RUHAIDAH SAMSUDIN
PUTEH SAAD
ANI SHABRI
vi
Contents
CHAPTER 5 ....................................................................................... 82
HYBRIDIZATION OF SOM AND GENETIC ALGORITHM TO
DETECT OF UNCERTAINTY IN CLUSTER ANALYSIS
E. MOHEBI
MOHD. NOOR MD. SAP
CHAPTER 6 ..................................................................................... 102
A MINING-BASED APPROACH FOR SELECTING BEST
RESOURCES NODES ON GRID RESOURCE BROKER
ASGARALI BOUYER
MOHD NOOR MD SAP
CHAPTER 7 ..................................................................................... 127
RELEVANCE FEEDBACK METHOD FOR CONTENT-BASED
IMAGE RETRIEVAL
ALI SELAMAT
PEI-GEOK LIM
CHAPTER 8 ..................................................................................... 159
DETECTING BREAST CANCER USING TEXTURE
FEATURES AND SUPPORT VECTOR MACHINE
AL MUTAZ ABDALLA
SAFAAI DERIS
NAZAR ZAKI
INDEX
172
PREFACE
Various authors provide their own views for Artificial Intelligence
(AI). Russel and Norvig summarize AI to be embedded in an agent
that is able to think humanly and rationally and act humanly and
rationally. Due to its wide coverage, AI is divided into the following
branches; natural language processing, knowledge representation,
automated reasoning, machine learning, computer vision and
robotics. In this book, machine learning is applied to solve a myriad
of problems occurring in various domains ranging from e-learning,
prediction, bioinformatics, content-based retrieval, data mining,
image recognition and grid computing. A number of algorithms are
developed in machine learning. The algorithms covered in this
volume are; Artificial Neural Network (ANN), Fuzzy Logic (FL),
Genetic Algorithm (GA), and Support Vector Machine (SVM). ANN
is suitable to solve problems related to learning and adaptation. FL is
suited to solve decision making problems that have imprecise
knowledge. GA is utilized to solve an optimization problem that is
categorized as NP-Hard. SVM is an excellent classifier for high
dimensional data.
There are many ANN algorithms available. They can be
categorized into supervised, unsupervised learning and reinforcement
learning. Each type of learning is suited to different applications. In
supervised learning, the learning tasks consist of two phases; training
and testing. The learning rule used here is called an Error Correction
technique or also known as a Gradient Descent technique. The
generalization capability achieved during the training phase is tested
during the testing phase by replacing the input samples with new
data. Back Propagation algorithm is a favourite among researchers
since it is effective and is widely used to solve various kinds of
problems. In this volume, Back Propagation (BP) algorithm is
utilized to solve the classification of the student learning styles based
on their learning preferences and behavior in Chapter 1. Chapter 2
displays its potentiality is solving yield forecasting problem. BP can
viii
Preface
also solve image classification problem as reported in Chapter 3.
On the other hand, in unsupervised learning the network by
itself will explore the correlations between patterns in the data and
organizes the patterns into classes based from the correlations without
being trained.
The common learning rules implemented in
unsupervised learning are Hebbian rules and Competitive rules. Selforganising Map (SOM) is an example the algorithm that implements
competitive rules. Chapter 5 highlights the capability of SOM
integrated with genetic algorithm to detect uncertainty in cluster
analysis.
Genetic Algorithms are directed random search techniques
used to look for parameters that provide an optimal solution to a NPHard or NP-Complete problem. GA begins with a set of solutions
(represented by chromosomes) called population. Solutions from one
population are taken and used to form a new population. Solutions
which are then selected to form new solutions (offspring) are selected
according to their fitness, the more suitable they are the more chances
they have to reproduce. Chromosomes are filtered iteratively using
mutation and crossover operators until chromosomes that fulfill a
desired objective are found. In Chapter 5, GA is combined with
SOM to detect uncertainty in cluster analysis.
Fuzzy logic is a set of mathematical principles for knowledge
representation based on degrees of membership. Unlike two-valued
Boolean logic, fuzzy logic is multi-valued. It deals with degrees of
membership and degrees of truth. FL uses the continuum of logical
values between 0 (completely false) and 1 (completely true). Chapter
6 illustrates the utilization of FL in constructing fuzzy decision tree to
select best resource nodes in grid computing.
Support Vector Machines (SVMs) are a set of related
supervised learning methods used for classification and regression.
SVM will construct a hyperplane in an n-dimensional space
separating two sets of input vectors that maximizes the margin
Preface
ix
between the two data sets. To calculate the margin, two parallel
hyperplanes are constructed, one on each side of the separating
hyperplane, which are "pushed up against" the two data sets.
Intuitively, a good separation is achieved by the hyperplane that has
the largest distance to the neighboring datapoints of both classes,
since in general the larger the margin the better the generalization
error of the classifier. Chapter 7 and 8 describe the application of
SVM to solve two different problems. Chapter 7 uses SVM to
provide relevance feedback to the user for content-based retrieval of
images. Chapter 8 classifies a cancerous data samples based on the
texture features.
Puteh Saad
Hishammuddin Asmuni
Faculty of Computer Science and Information System Universiti
Teknologi Malaysia
2008
1
NEURAL NETWORK FOR
CLASSIFYING STUDENT
LEARNING CHARACTERISTICS IN
E-LEARNING
Nor Bahiah Hj Ahmad
Siti Mariyam Hj Shamsuddin
1.1
INTRODUCTION
Neural Network (NN) is an information processing
paradigm that is inspired by the way of biological nervous
system, such as the brain that process information received
through the senses in the human body. It has been used
widely in many applications such as automotive, aerospace,
banking, medical, robotics, electronic and transportations.
NN is able to learn complex non-linear input-output
relationships and is adaptive to environment.
NN has been extensively used for user modeling,
mainly for classification and recommendation in order to
group users with the same characteristics and create profile
(Frias-Martinez et al., 2005). Some examples are Bidel,
Lemoine, and Piat (2003) which use NN to classify user
navigation paths, Stathacopoulou et al., (2006) and
2
Advances in Artificial Intelligence Applications
Villaverde et.al (2006) which uses NN to assess student’s
learning style.
A problem which arises when trying to apply NN to
model human behavior is knowledge representation
(Stathacopoulou et al., 2006).
The black-box
characteristics of NN cannot help much since the weights
learned are often difficult for human to interpret. To
alleviate the situation, back-propagation network (BPNN)
which is a supervised learning algorithm reduced the global
error produced by the network over the weight space. This
chapter discusses the implementation of BPNN to represent
and detect students’ learning styles in a web-based
education system.
A learning system that provides learning resources
according to Felder Silverman (FS) learning style has been
developed and tested on University Technology Malaysia
(UTM) students taking Data Structure subject. In this
chapter, we describe the classification of the student
learning styles based on their learning preferences and
behavior.
Learning style has become a significant factor
contributing in learner progress (Magoulas et al., 2003) and
has become an important consideration while designing an
on-line learning system. It is important to diagnose the
students learning style because some students learn more
effectively when taught with personalised methods.
Information about the learning style can help the system
become more sensitive to the differences of students using
the system. Understanding learning styles can improve the
planning, producing, and implementing of educational
experiences, so they are more appropriately tailored to
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
3
students’ expectations, in order to enhance their learning,
retention and retrieval (Carver and Howard, 1996).
Typically questionnaires are used to diagnose
learning characteristics. However, the use of questionnaires
is time consuming and unreliable method for acquiring
learning style characteristics and may not be accurate
(Villaverde et al., 2006; Stash and de Bra, 2004; Kelly and
Tangney, 2004). Once the profile is generated, it is static
and does not change regardless of user interaction.
Nevertheless, student’s learning characteristics changes
while given different task in on-line learning environment.
In the first section of the chapter, we describe
Felder Silverman learning style model. Then the process of
capturing and analyzing the student behavior while learning
using hypermedia is outlined. The subsequent section
discusses the analysis of the distribution of the learners
learning style, preferences and their navigation behavior.
The classification of students based on the integration of FS
features using BPNN is described. The chapter concludes
with a brief discussion and main conclusion drawn from the
experiment conducted.
1.2
FELDER SILVERMAN LEARNING STYLE
MODEL
Felder-Silverman (FS) learning style model was developed
by Felder and Silverman in 1998. This model categorized a
student’s dominant learning style along a scale of four
dimensions: active-reflective (how information is
processed), sensing-intuitive (how information is
perceived), visual-verbal (how information is presented)
4
Advances in Artificial Intelligence Applications
and global–sequential (how information is understood).
Table 1.1 describes the characteristics of students based on
the learning dimensions.
The model has been successfully used in previous
studies involving adaptation of learning material,
collaborative learning and for traditional teaching (Felder
and Silverman, 1998; Zywno, 2003; Carmo et al., 2007).
Furthermore, the development of the hypermedia learning
system that incorporate learning components such as the
navigation tool, the presentation of the learning material in
graphics form, simulation, video, sound and help facilities
can easily tailor to the FS learning style dimension.
Felder and Solomon developed Index of Learning
Styles (ILS) questionnaire to identify the students learning
style. The objective of these questionnaires is to determine
the dominant learning style of a student. (active-reflective,
sensing-intuitive, visual-verbal, and sequential-global).
This study integrate the processing, perception, input and
understanding learning styles to map the characteristics of
the students into 16 learning style. Table 1.2 lists the 16
learning styles that are proposed in Integrated Felder
Silverman (IFS) model. The rationale of the integration of
these dimensions is to minimize the time consumption in
diagnosing the learning styles.
Previous research such as the work done by
(Villaverde et al., 2006; Kelly and Tangney, 2004; Garcia
et al., 2006; Graf and Kinshuk, 2006; Yaanibelli et al.,2006
and Lo and Shu,2005) attempted to detect learning styles
based on the student behavior and interaction while
learning on-line. Various techniques have been used to
represent student models, such as neural network
(Villaverde et al., 2006 and Lo and Shu, 2005), Genetic
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
5
Algorithm (Yaanibelli et al., 2006), FS measurement (Graf
and Kinshuk, 2006), fuzzy logic, bayesian networks (Kelly
and Tangney, 2004 and Garcia et al., 2006), and case-based
reasoning.
Table 1.1
Felder Silverman learning dimension and
learner characteristics (Felder and Silverman,1988)
Dimension
Learner Characteristics
Active
Reflective
Retain and understand
Prefer observation rather than
Processing
information best by doing active experimentation. Tend to
something active with it think about information quietly
such as discussing it,
first.
applying it, or explaining
it to others.
Sensor
Intuitive
Like learning facts, often Prefer discovering possibilities
like solving problems by and relationships. Like
well-established methods innovation and dislike repetition.
Perception
and dislike complications Better at grasping new concepts
and surprises. Patient
and comfortable with
with details and good at abstractions and mathematical
memorizing facts and
formulations. Tend to work faster
doing hands-on work.
and more innovative than
More practical and
sensors.
careful than intuitors.
Visual
Verbal
Remember best what they More comfortable with verbal
Input
see from visual
information such as written texts
representations such as
or lectures.
graphs, chart, pictures
and diagrams.
Sequential
Global
Prefer to access well
Prefer to learn in large chunks,
Understanding structured information
absorb material randomly
sequentially, studying
without seeing connections and
each subject step by step. then suddenly getting it. Able to
solve complex problems quickly
or put things together in novel
ways once they have grasped the
big picture.
6
Advances in Artificial Intelligence Applications
Table 1.2
Sixteen learning styles based on Felder
Silverman learning dimension
Learning Styles
Label
1.
Active/Sensor/Visual/Sequential
ASViSq
2.
Reflective/Sensor/Visual/Sequential
RSViSq
3.
Active/Intuitive/Visual/Sequential
AIViSq
4.
Reflective/Intuitive/Visual/Sequential
RIViSq
5.
Active/Sensor/Verbal/Sequential
ASVbSq
6.
Reflective/Sensor/Verbal/Sequential
RSVbSq
7.
Active/Intuitive/Verbal/Sequential
AIVbSq
8.
Reflective/Intuitive/Verbal/Sequential
RIVbSq
9.
Active/Sensor/Visual/Global
ASViG
10. Reflective/Sensor/Visual/Global
RSViG
11. Active/Intuitive/Visual/Global
AIViG
12. Reflective/Intuitive/Visual/Global
RIViG
13. Active/Sensor/Verbal/Global
ASVbG
14. Reflective/Sensor/ Verbal/Global
RSVbG
15. Active/Intuitive/Verbal/Global
AIVbG
16. Reflective/Intuitive/Verbal/Global
RIVbG
The novelty of our study is to classify the propose
IFS features of students learning styles by employing
BPNN technique. Neural network has been chosen for the
following reasons:
(a) Their pattern recognition ability of imprecise or not
fully understood data.
(b) Their ability to generalise and learn from specific
examples
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
7
(c) Their ability to be updated quickly with extra
parameters
(d) Their speed of execution, which makes them ideal for
real time applications.
1.3
METHODOLOGY
In order to determine which characteristics of the students
can be used to identify their learning style, we have
conducted an experiment on 115 UTM students. Among
them 75 are Computer Science students and 40 students are
Computer Engineering students who took Data Structure
subject. During the study, the students were required to
attend lectures, participated in lab exercises, worked in
group to solve problem given, self-studied using e-learning
system, participated in forum discussion and took on-line
quiz. All materials can be accessed through a hypermedia
learning system which was integrated in the learning
system provided by UTM. During the study, the students
were required to answer 2 questionnaires; ILS
questionnaire and questionnaire to get the feedback
regarding the system used and the student’s preferences of
the learning material.
1.4
DEVELOPMENT OF THE LEARNING
SYSTEM
When designing the learning material, it is important to
accommodate elements that reflect individual differences in
learning (Bajraktarevic, 2003). Systems such as iWeaver
(Wolf, 2003), CS383 (Carver and Howard, 1996), (Graf
and Kinshuk, 2005), INSPIRE (Papanikolau et al., 2003)
and SAVER (Lo and Shu, 2005) proposed several learning
8
Advances in Artificial Intelligence Applications
components to be tailored according to learning styles of
the students. We adopted the components implemented in
iWeaver, CS383 and INSPIRE in our system due to the
success of the researches. The learning resources are
structured into components that are suitable for processing,
perception, input and understanding learning dimension.
Among the resource materials provided in the learning
systems are as follows:
Forum – Provide mechanism for active discussions among
students
Animation – Provide simulations of various sorting
technique. The process of how each sorting technique is
implemented can be viewed step by step according to the
chosen algorithm.
Sample Codes – Provide source codes that student can
view and actively execute various sorting programs
Hypertext – Provide learning content which is composed
into theory and concepts. The learning content consists of
topic objectives, sub modules, and navigation link.
Power Point Slideshow – Provide example and description
in the form of text, pictures and animations.
Exercises – Designed in multiple choice questions which
students can answer and get hint and feedback regarding
their performance.
Assessment on-line – An on-line quiz that consist of
multiple choice questions and marks that can be displayed
immediately after the student submit the quiz.
Table 1.3 lists the learning resources and the learning styles
that match the resources.
9
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
Table 1.3
Learning resources developed based on FS
learning dimension
Dimension
Learning Approaches
Active
Processing
Perception
Input
Understanding
1.5
Post and reply forum
Exercise
Simulation
Code execution
Reflective
View Forum
Hypertext access
Sensor
Intuitive
# of backtrack in hypertext
concrete material
(Hypertext)
access to example
Exercises
Exam delivery duration
Exam revision
Abstract material
(Hypertext)
Visual
Verbal
Simulation
Simulation coverage
Code execution
Hypertext
PPt Slide
Sequential
Global
Hypertext – navigate
linearly
Hypertext coverage
# of visiting
Course overview
ANALYSIS OF THE QUESTIONNAIRE
The purpose of the ILS questionnaire is to determine the
learning style of UTM students. Figure 1.1 shows the
distribution of Learning Styles collected from 115 students
10
Advances in Artificial Intelligence Applications
taking Data Structure subject in UTM. They were required
to fill up the ILS questionnaire in order to determine their
learning styles. From the survey, we found out that there
are only 14 learning styles exist among the students.
Majority of the students have Active/Sensor/Visual/
Sequential learning styles followed by Reflective/
Sensor/Visual/Sequential. The result is consistent with the
studies done by (Zywno, 2003) who concluded that the
default learning styles is Active/Sensor/Visual/Sequential.
Figure 1.1
students
Distribution of learning styles among UTM
However, in this study we found out that no students fall
into 2 categories of learning style, which are Active/
Intuitive/ Verbal/ Sequential and Reflective/ Sensor/ Verbal/
Global. The main reason that the 2 learning style are
absent is due to that there are only twelve students who
have verbal learning styles which are not enough to cover
the 16 learning styles.
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
11
The second questionnaire attempts to get feedback
from the students regarding the system and their
preferences. Table 1.4 summarizes the preferences of the
learning resources based on their learning style.
1.6
CLASSIFICATION USING NEURAL
NETWORK
In this study, we used Back-propagation Neural Network to
model the learning style dimension. BPNN is generally
composed of an input layer, several hidden layer and an
output layer. When the network is given an input, the
updating of activation value propagates forward from the
input layer of processing unit through each internal layer to
the output layer of processing units. The output unit then
provides the network’s response. When the network
corrects its internal parameters, the correction mechanism
starts with the output units and back propagates backward
through each internal layer to the input layer. Backpropagation can adapt two or more layer of weights and
uses more sophisticated learning rules. Its power lies in its
ability to train hidden layer.
This initial study attempts to explore the feasibility
of BPNN on classifying student learning styles. The first
experiment classified the student’s characteristics into four
FS dimension. The second experiments conducted classify
the student’s characteristics into integrated FS dimension.
The results of the two experiments are compared in order
study the performance of the experiments.
12
Advances in Artificial Intelligence Applications
Table 1.4
List of attributes and values for Felder
learning dimension
Label
Attribute Name
Value
A1
Post and reply forum
Much/Few
A2
# of Exercise visited
Much/Few
A3
# of Simulation visited
Much/Few
A4
# of executing Codes
Much/Few
A5
# of viewing/reading forum
Much/Few
A6
Hypertext coverage
Much/Few
A7
# of backtrack in hypertext
Much/Few
A8
concrete material (Hypertext)
Much/Few
A9
abstract material (Hypertext)
Much/Few
A10
access to example
Much/Few
A11
# of Exercise visited
Much/Few
A12
Exam delivery duration
Quick/slow
A13
Exam revision
Much/Few
A14
# of Simulation visited
Much/Few
A15
# of diagram/picture viewing
Much/Few
A16
Hypertext coverage
Much/Few
A17
PowerPoint Slide Access
Much/Few
A18
Hypertext – navigate linearly
Linear/global
A19
Hypertext coverage
Much/Few
A20
# of visiting Course overview
Much/Few
Dimension
Active/
Reflective
Sensor/
Intuitive
Visual/
Verbal
Sequential
/global
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
1.7
13
DATA DESIGN AND KNOWLEDGE
REPRESENTATION
From the analysis of the learning component preferences of
the students, we simulated the data that represents the
characteristics of the students based on the learning styles.
Table 1.4 lists the attributes for FS learning dimension.
There are 20 attributes identified which will be mapped
into 16 learning styles as listed in Table 1.1. Attributes A1
– A6 were used to identify Active/Reflective learners,
attributes A7 – A13 were used to identify Sensor/Intuitive
learners, attributes A14 – A17 were used to identify
Visual/Verbal and attributes A18 – A20 were used to
identify Sequential/Global learners.
1.8
EXPERIMENTS AND RESULTS
We simulated data based on the attributes and values listed
in Table 1.4 that represents the characteristics of students
based on FS learning model. During the experiment, 80%
of the data was used for training and 20% of the data was
used for testing.
In designing BP neural network, we take into
account parameters involve such as number of training data,
number of hidden layers, number of processing units in
input layer, hidden layer and output layer. Table 1.4 lists
the input variable for each learning dimension, while Table
1.5 shows the structure of the neural network being trained
on the simulated data. In this study, we selected sigmoid
transfer function since it is a better choice for classifying
problem (Paridah et al., 2001). We implemented 1/(1+e-x)
as an activation function. We then run the neural network
14
Advances in Artificial Intelligence Applications
on several unseen data and the classification result is shown
in Table 1.6. Overall, we found out that the classification
accuracy for Active/Reflective is 100%, Sensing/Intuitive
98.5%, Visual/Verbal 100% and Sequential/Global 96%.
Based on the accuracy of the testing results, we conclude
that BPNN is able to classify the student’s learning
dimension with accurate result.
Table 1.5
Neural Network Architecture for classifying
FS dimension
No. of
Input
Neurons
FS Dimension
No. of
Hidden
Neurons
Learning
Rate
Momentum
Error
Rate
Active/Reflective
6
4
0.2
0.7
0.005
Sensing/Intuitive
7
5
0.2
0.7
0.005
Visual/Verbal
4
3
0.2
0.7
0.005
Sequential/Global
3
2
0.2
0.7
0.005
Table 1.6
FS Dimension
Classification Results
Training
Accuracy
Active/Reflective
100%
Sensing/Intuitive
100%
Visual/Verbal
100%
Sequential/Global
100%
Testing
Accuracy
100%
98.5%
100%
96.0%
In the second experiment, we combined the twenty
attributes belong to the four dimension of FS in order to
classify the students based on the integrated FS. In this
experiment we intend to classify students into sixteen
learning style. We simulated 1190 data and divided the
data into 4 groups. We ran BPNN 4 times on different
testing and training samples. The BPNN architecture for
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
15
classifying students based on the integrated FS is listed
below:
Backpropagation Neural Network Architecture
Input neurons - 20
Hidden neurons -12
Output Neuron - 4
Learning Rate - 0.002
Error Rate – 0.005
Figure 1.2 shows the topological structure of BPNN used
for the classification. The architecture consists of 20 input
neurons, 12 hidden neurons, and 4 output neurons. The
trial and error approach has been used to find a suitable
number of hidden layers that provides the best
classification accuracy.
The accuracy classifying results were analyzed and
shown in Table 1.7. The result reveals that the average
testing accuracy is satisfactory with 94.75% accuracy. We
found out that the student’s category classified using
integrated FS dimension mostly matches the result with
individual FS dimension. Based on the results, it shows that
the integrated FS dimension can be used to classify student
learning styles accurately and much faster compare to the
individual classification of the learning style dimension.
16
Advances in Artificial Intelligence Applications
Figure 1.2
Table 1.7
Sample No.
Sample 1
Sample 2
Sample 3
Sample 4
Average
Neural Network Architecture
Results for The Classification of IFS
Training
Results
Accuracy
100%
98%
98%
100%
99%
Testing
Results
Accuracy
96%
94%
96%
93%
94.75%
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
1.9
17
CONCLUSION AND FUTURE WORK
This research had identified several issues related to
learning styles which is an important feature of adaptation
in hypermedia learning environment. Experiments done
using BPNN showed that BPNN was able to classify the
learning dimension of a student by examining the students’
interaction in the hypermedia learning system. The results
showed that BPNN performed well in identification of the
learning styles. For future, we will hybrid Neural Network
and Rough Sets to improve the classification performance
in term of the processing time. We also will analyze the
real behavior and characteristics of the students which is
traced from the web log in order to diagnose the student’s
learning style.
1.10
REFERENCES
Bajraktarevic, N., Hall and W. Fullick, P. (2003).
Incorporating Learning Styles in Hypermedia
Environment: Empirical Evaluation. Proceedings
of AH2003: Workshop on Adaptive Hypermedia and
Adaptive Web-based Systems, Budapest, Hungary,
41-52.
Carver, C. A., Howard R. A. and Lavelle E. (1996).
Enhancing Student Learning By Incorporating
Learning Styles Into Adaptive Hypermedia.
Proceedings of ED-MEDIA’96 World Conference
on Educational Multimedia Hypermedia, pp. 118123.
Felder, R. and Soloman, B. Index Of Learning Styles
Questionnaire. Retrieved 6 February, 2006, from
18
Advances in Artificial Intelligence Applications
http://www.engr.ncsu.edu/learningstyles/
ilsweb.html
Felder, R. and Silverman, L. (1988). Learning And
Teaching Styles In Engineering Education.
Engineering Education, 78 (7), pp. 674–681.
http://www.ncsu.edu/felderpublic/
Papers/LS1988.pdf.
Graf, S. and Kinshuk. (2006). An Approach for Detecting
Learning Styles in Learning Management Systems.
Proceedings of the International Conference on
Advanced Learning Technologies. IEEE Computer
Science, pp. 161-163.
Kelly, D. and Tangney, B. (2004). Predicting Learning
Characteristics In A Multiple Intelligence Based
Tutoring System. LNCS Volume 3220/2004.
Springer Berlin/Heidelberg.
Lo, J. and Shu P. (2005). Identification Of Learning Styles
Online by Observing Learners’ Browsing
Behaviour Through A Neural Network. British
Journal of Educational Technology. 36(1) 43–55.
Magoulas, G., Papanikolaou, K. and Grigoriadou, M.
(2003).
Adaptive
Web-Based
Learning:
Accommodating Individual Differences Through
System’s
Adaptation.
British
Journal
of
Educational Technology. 34(4).
Papanikolau, K., Grigoriadou, M., Knornilakis, H. and
Magoulas, G. (2003).
Personalizing thelepa
Interaction
in
a
Web-based
Educational
Hypermedia System: the case of INSPIRE – User
Modeling and User-Adapted Interaction. (13), 213
– 267.
Paridah, S., Nor Bahiah, A., Norazah, Y., Siti Zaiton M.H.
and Siti Mariyam S. (2001). Neural Network
Application on Knowledge Acquisition for
Neural Network for Classifying Student Learning
Learning Characteristics in E-Learning
19
Adaptive Hypermedia Learning. Chiang Mai
Journal Science, 28(1);65-70.
Stash, N. and de Bra P. (2004). Incorporating Cognitive
Styles in AHA! (The Adaptive Hypermedia
Architecture).
Proceedings of The IASTED
International
Conference
WEB-BASED
EDUCATION, 2004, Austria.
Villaverde, J., Godoy, D. AND Amandi, A. (2006).
Learning Styles' Recognition In E-Learning
Environments With Feed-Forward Neural Networks.
Journal of Computer Assisted Learning, Vol. 22(3),
pp. 197—206.
Wolf, C. (2003). iWeaver: Towards an Interactive WebBased Adaptive Learning Environment to Address
Individual Learning Styles. Australasian Computing
Education Conference (ACE2003), Adelaide,
Australia.
Zywno, M. S. (2003). A Contribution to Validation of
Score Meaning for Felder-Soloman’s Index of
Learning Styles. Proc. of the 2003 American
Society for Engineering Annual Conference and
Exposition. online at http://www.ncsu.edu/felderpublic/
ILSdir/Zywno_Validation_Study.pdf,
Retrieved
August 2, 2006.
2
A HYBRID ARIMA AND NEURAL
NETWORK FOR YIELDS
PREDICTION
Ruhaidah Samsudin
Ani Shabri
2.1
INTRODUCTION
The accuracy of time series forecasting is fundamental to
many decisions processes and hence the research for
improving the effectiveness of forecasting models has
never been stopped (Zhang, 2003). Recent research
activities in artificial neural network (ANN) have shown
powerful pattern classification and pattern recognition
capabilities. One major application area of ANN is
forecasting (Sharda, 1994). ANN provides an attractive
alternative tool for both forecasting researchers and has
shown their nonlinear modeling capability in data time
series forecasting.
The ARIMA model is one of the most popular
models in traditional time-series forecasting and is often
used as a benchmark model to compare with other models.
The popularity of the autoregressive integrated moving
average (ARIMA) model is due to its statistical properties
A Hybrid ARIMA and Neural Network for Yields Prediction
21
as well as the well-known Box-Jenkins methodology in the
model building process. However, the ARIMA model is
only a class of linear model and thus it can only capture
linear feature of data time series.
ARIMA models and ANN are often compared with
mixed conclusions in terms of superiority in forecasting
performance. Although the ANN model achieves success in
many time series forecasting, they have some
disadvantages. Survey of the literature shows that both
ARIMA and ANN models have performed well in different
cases (Zhang, 1999). Since the real world is highly
complex, there exits some linear and nonlinear patterns in
the time series simultaneously. It is not sufficient to use
only a nonlinear model for time series because the
nonlinear model might miss some linear features of time
series data (Zou et al., 2007).
In real life, time series often contain both linear and
nonlinear patterns. If this is the case, there is no universal
model that is suitable for all kinds of time series data. Both
ARIMA models and ANN models have achieved success in
their own linear or nonlinear domains, but neither ARIMA
nor ANN can adequately model and predict time series
since the linear models cannot deal with nonlinear
relationships while the ANN model alone is not able to
handle both linear and nonlinear patterns equally well
(Zhang, 2003). ARIMA model is a class of linear models
that can capture time series’ linear characteristics, while
ANN models are a class of general function approximator
capable of modeling non-linearity and which can capture
nonlinear patterns in time series.
Thus, the objective of this paper is to develop a
hybrid model for rice yield forecasting. This model
22
Advances in Artificial Intelligence Applications
combines a time series linear model (ARIMA) and
nonlinear ANN model. This is because the ANN model and
ARIMA model are complementary.
2.1
ARIMA MODEL
George Box and Gwilym Jenkins (1976) developed
ARIMA models to become popular at the beginning of the
1970s. The general ARIMA models are compound of a
seasonal and non-seasonal part are represented by the
following way:
φ p ( B)Φ P ( B s )∇ d ∇ sD xt = θ q ( B)Θ Q ( B s )a t
(1)
where φ(B) and θ(B) are polynomials of order p and q,
respectively; Φ ( B s ) and Θ( B s ) are polynomials in B s of
degrees P and Q, respectively; p order of nonseasonal
autoregression; d number of regular differencing; q order of
the nonseasonal moving average; P order of seasonal
autoregression; D number of seasonal differencing; Q order
of seasonal moving average; and s length of season. For
time series forecasting task, the prediction model has the
general form
xt = f ( xt −1 , xt − 2 , ..., xt − p ) + et
(2)
The Box-Jenkins methodology is basically divided in the
four stages: identification, estimation, diagnostic checking
and forecasting. The identification stage involved
A Hybrid ARIMA and Neural Network for Yields Prediction
23
transforming the data if necessary to improve the normality
and the stationary time series. The next step is choosing the
suitable model by analyzing both the autocorrelation (ACF)
and partial autocorrelation function (PACF) of the
stationary series. Once a model is identified, the parameters
of the model are estimate. It is necessary to check if the
assumptions are satisfied. Diagnostic checking using the
ACF and PACF of residuals was carried out, which can be
referred to Brockwell and Davis (2002). The forecasting
model was then used to compute the fitted values and
forecasts values.
2.2
THE NEURAL NETWORK FORECASTING
MODEL
The ANN model is an information processing system that has
certain performance characteristics in common with biological
neural networks. Also, ANNs have been developed as
generalizations of mathematical models of human cognition or
neural biology.
A typical neural network used in the present study
is shown in Figure 2.1. This is called feed forward type of
network. In general, it is composed of three layers: the
input layer, the hidden layer and the output layer. Each
layer has a certain number of processing elements called
neurons. Signals are passed between neurons over
connection links. Each connection link has an associated
weight, which in a typical neural net, multiplies the signal
transmitted. Each neuron applies a transfer function to its
net input to determine its output signal.
The input layer of network consists of n units
( x1 , x2 ,..., xn ) and one bias unit ( x0 ) . The hidden layer
24
Advances in Artificial Intelligence Applications
consists of p units ( z1 , z 2 ,..., z p ) and one bias unit ( z 0 ) ,
while the output layer has one unit ( y ) , which is the value
to be predicted. The bias units have the value “one” as
input signals.
W ij
TF1
x1
Wj
TF2
x2
.
.
.
.
.
.
TFj
xi
.
.
.
y
.
.
.
xn
TFm
TF1
Z 1 = TF1 =
1
1 + e −t
t = θ1 + ∑ xi wi1
n
y
Three-Layer
1
1 + e −t
t = θ + ∑ z jwj
m
i =1
Figure 2.1
Network
y =
j =1
Back
Propagation
Neural
To build a model for forecasting, the training of the ANN
by back propagation process is carried out in the following
steps.
Step 1: Set all the weights and threshold levels of the
network.
Step 2: Calculate the actual outputs of the neurons in the
hidden layer.
A Hybrid ARIMA and Neural Network for Yields Prediction
z j = f (θ j + ∑ xi wij )
25
n
i =1
(3)
where n is the number of inputs of neuron j in the
hidden layer θ is the threshold applied to the neuron, wij is
the weight for the connection between input layer i and the
hidden layer j, f(.) is the transfer function, typical sigmoid
function given by
f (t ) =
1
(4)
1 + e −t
Step 3: Calculate the actual outputs of the neurons in the
output layer:
y1 = f (θ1 + ∑ z j w j )
m
j =1
(5)
where p is the number of hidden layer.
Step 4: Update the weights in the back propagation network
use error back propagation.
The weight is updated using the following equation
w(t + 1) = w(t ) + Δw(t )
(6)
where t the number of iteration and Δw(t ) is the weight
correction.
Then, the output unit computes its activation to get the
signal y1 .
Step 5: Increase iteration t by one, go back to Step 2 and
repeat the process until the selected error criterion is
satisfied.
26
Advances in Artificial Intelligence Applications
The relationship between the input observations
( y t −1 , y t − 2 , ..., y t − p ) and the output value ( y t ) has following:
y t = a 0 + ∑ j =1 a j f ( w0 j + ∑i =1 wij y t −i ) + ε t
q
p
(7)
where a j (j = 0, 1, 2, …, q) is a bias on the jth unit, and
wij (i = 0, 1, 2, …, p; j = 0, 1, 2, …, q) is the connection
weights between layers of the model, f(•) is the transfer
function of the hidden layer, p is the number of input nodes
and q is the number of hidden nodes (Lai et al. 2006).
Actually, the ANN model in (7) performs a nonlinear
functional mapping from the past observation
( y t −1 , y t − 2 , ..., y t − p ) to the future value ( y t ) , i.e.,
y t = f ( y t −1 , y t − 2 , ..., y t − p , w) + ε t
(8)
where w is a vector of all parameters and f is a function
determined by the network structure and connection
weights. Thus, in some senses, the ANN model is
equivalent to a nonlinear autoregressive (NAR) model. A
major advantage of neural networks is their ability to
provide flexible nonlinear mapping between inputs and
outputs. They can capture the nonlinear characteristics of
time series well.
2.1
THE HYBRID FORECASTING
METHODOLOGY
Zhang et al. (2003) proposed methodology of hybrid
(Hzhang) model consists of a linear component and
nonlinear component. These two components have to be
A Hybrid ARIMA and Neural Network for Yields Prediction
27
estimated from the data. First, we let ARIMA model is used
to analyze the linear part of the problem and then, the ANN
model is used to model residuals from the ARIMA model.
The results from the ANN can be used as predictions of the
error term for the ARIMA model. The hybrid model can be
defined as
Yt = Lt + N t (et )
(9)
where Lt is a linear component and N (et ) is residual of
nonlinear component for time t from the linear model Lt .
In this study another hybrid methodology was
proposed to forecast rice yield series. In the first step, the
ANN models were applied to forecast the series data. After
fitting the ANN model, the residuals are obtained and the
ARIMA model is used for fitting the residuals. The results
from the ARIMA are used for prediction of the error for the
ANN model. The new hybrid (HNEW) model may be
written as
Yt = N t + Lt (et )
(10)
where N t is a nonlinear component and L(et ) is residual of
nonlinear component for time t from the nonlinear model
Nt .
2.2
EMPIRICAL RESULTS
The data set used in this paper is the rice yields data from
1995 to 2003, giving a total of 432 observations. The data
were collected from Muda Agricultural Development
28
Advances in Artificial Intelligence Applications
Authority (MUDA) Kedah, Malaysia ranging from 1995 to
2003. There are 4 areas with 27 locations. The rice yield
series data is used in this study to demonstrate the
effectiveness of the hybrid method. The time series of the
data is given in Figure 2.2.
Figure 2.2
2.3
Rice yield data series (1995- 2003)
ARIMA MODELS
The plots in Figure 2.2 indicate that the time series of rice
yields are non-stationary in the mean and the variance. The
transformed time series using the natural logarithm was
taken, and then differencing was applied. The sample ACF
and PACF for the transformed series are plotted in Figure
2.3. The sample ACF of the transformed data revealed
significant time lags at 1 and 27, while the PACF spikes at
lag 1, 2 and 27. The sample ACF and PACF indicated that
the rice yields have exhibited some pattern of seasonality in
the series.
A Hybrid ARIMA and Neural Network for Yields Prediction
29
The plots suggest that ARIMA model is appropriate.
Several models were identified and the statistical results
during training are compared in the following Table 2.1.
The criterions to judge for the best model based on mean
square error (MSE) show that the ARIMA (0,1,1)X(1,0,1)
is a relatively best model. This model has both nonseasonal and seasonal components.
Figure 2.3
ACF and PACF for differenced series of
natural logarithms
2.4
NEURAL NETWORK MODELS
In this investigation, we only consider the situation of onestep-ahead forecasting with 27 observations. Before the
training process begins, data normalization is often
performed. The linear transformation formula to [0, 1] is
used:
30
Advances in Artificial Intelligence Applications
xn =
x0 − x min
x max − x min
(11)
where x n and x0 represent the normalized and original
data; and x min and x max represent the minimum and
maximum values among the original data. In order to
conform the neural network used in the forecast, ACF and
PACF were used to determine the maximum number of
input neurons used during the training (Cadenas and Rivera,
2007). The input nodes are 3, 9, 18 and 27.
Table 2.1
results
Comparisons of ARIMA models statistical
ARIMA
MODEL
(4,1,0)x(1,0,0)
(4,1,0)x(0,0,1)
(0,1,1)x(1,0,1)
(0,1,1)x(0,0,1)
TRAIN
MSE
FORECASTING
0.00599
0.00358
0.00335
0.00348
0.00118
0.00124
0.00108
0.00120
The hidden layer plays very important roles for
many successful applications of neural network. The
hidden layer allow neural network to detect the feature, to
capture the pattern in the data, and to perform complicated
nonlinear mapping between input and output variables. The
most common way in determining the number of hidden
nodes is via experiments or by trial-and-error. In this study,
A Hybrid ARIMA and Neural Network for Yields Prediction
31
for each input layer, the number of hidden nodes were
determined using formula “ I / 2 ” (Kang, 1991), “I” (Tang
and Fiswick, 1993), “2I” (Wong, 1991) and “2I +1”
(Lippmann, 1987), where I corresponding input neurons.
The sigmoid activation function was used as the transfer
function at both hidden and output layers.
The network was trained for 5000 epochs using the
back-propagation algorithm with a learning rate of 0.001
and a momentum coefficient of 0.9. The results in terms of
MSE statistic for all the ANN models are presented in
Table 2.2.
Analyzing the results during training, it can be
observed that the structure of ANN (27,54,1) and
ANN(27,55,1) give slightly better forecasts the others of
ANN.
Next, comparison was made between ARIMA
model, neural network and hybrid model for different leads
times. A subset ARIMA (0,1,1) × (1,0,1) 27 has been found to
be the most parsimonious among all ARIMA models that
are also found adequate judged by the residual analysis. A
neural network architecture of (27, 55, 1) is used to model
the nonlinear patterns. Finally, by combining the ARIMA
(0,1,1) × (1,0,1) 27 and neural network architecture of (27, 55,
1) hybrid models were obtained. The ARIMA, neural
network and hybrid models for rice yields are discussed
here. Figure 2.4 shows the bar chart in compares the
validation MSE of the four forecasting method. The
comparison between actual and predicted values is given in
Figure 2.5.
32
Advances in Artificial Intelligence Applications
Table 2.2
Comparisons of various ANN models
together with the performance statistics for the rice yields.
MSE
Forecasting
0.0038
0.0021
0.0025
0.0017
Input
4
Hidden
2
4
8
9
Training
0.0060
0.0051
0.0049
0.0049
10
5
10
20
21
0.0050
0.0049
0.0047
0.0046
0.0034
0.0022
0.0025
0.0018
18
9
18
36
37
0.0050
0.0048
0.0048
0.0046
0.0014
0.0013
0.0012
0.0020
27
13
27
54
55
0.0047
0.0052
0.0047
0.0047
0.0012
0.0010
0.0011
0.0011
Note: The data in boldface means the prediction
performance of NN is better than those of others,
respectively.
From Figure 2.4, it can be seen that the percentage error
values of ARIMA and HNEW has the same value and
perform better than the Hzang and NN model in the
training process. The results in term of MSE show there is
increase in their value as the lead time increase. The
HNEW model has a good performance for the first 15 lead
times (first 15 forecasted periods). Then, the error increased
33
A Hybrid ARIMA and Neural Network for Yields Prediction
gradually. The maximum percentage error value is 0.17.
Both ANN and Hzhang have shown poor performances in
training and forecasting process.
Figure 2.4
The MSE in the training and forecasting
process for different lead times
It is observed that the best forecasting result in term of
MSE, the performance of hybrid model was better than
ARIMA and ANN models.
This shows that the
performance improves for hybrid models.
220000
DATA
ARIMA
200000
Rice Yield
Hzhang
180000
NN
HNEW
160000
140000
120000
27
25
23
21
19
17
15
13
11
9
7
5
3
1
100000
Locations
Figure 2.5
Comparisons of the prediction made with
ARIMA, ANN, Hzhang, HNEW and data of rice yields
34
2.5
Advances in Artificial Intelligence Applications
CONCLUSIONS
This study compares the performance of ANN model, the
statistical model (ARIMA) and the hybrid model in
forecasting the rice yields of Malaysia. Results show the
ANN model forecasts are considerably less accurate than
the traditional ARIMA model which used as a benchmark.
On the other hand, the hybrid model using the ARIMA
model and the error of NN model is an effective way to
improve the forecasting performance in error measures in
term of MSE. The hybrid model takes advantage of the
unique strength of ARIMA and ANN in linear and
nonlinear modeling. For complex problems that have both
linear and nonlinear correlation structures, the combination
method can be an effective way to improve forecasting
performance. The empirical results with the rice yields data
set clearly suggest that the hybrid model perform better
than other two models explored in forecasting the rice
yields.
2.6
REFERENCES
Box, G. E. P. and Jenkins, G. (1970). Time Series Analysis.
Forecasting and Control, Holden-Day, San
Francisco, CA.
Brockwell, P. J. and Davis, R. A. (2002). Introduction to
Time Series and Forecasting. Berlin: Springer.
Cadenas, E. and Rivera, W. (2007). Wind Speed
Forecasting in the South Coast of Oaxaca, Mexico.
Renewable Energy. 32:2116-2128.
Kang, S. (1991). An investigation of the Use of
Feedforward Neural Network for Forecasting. Ph.D.
Thesis, Kent State University.
Lai, K. K., Yu, L., Wang, S. and Huang, W. (2006).
Hybridizing Exponential Smoothing and Neural
A Hybrid ARIMA and Neural Network for Yields Prediction
35
Network for Financial Time Series Predication.
ICCS 2006, Part IV, LNCS 3994: 493-500.
Lippmann, R. P. (1987). An Introduction to Computing with
Neural Nets. IEEE ASSP Magazine, April, 4-22.
Sharda, R. (1994). Neural Networks for the MS/OR analyst:
An application bibliography. Interfaces. 24(2), 116-30.
Tang, Z. and Fishwick, P. A. (1993). Feedforward Neural
Nets as Models for Time Series Forecasting. ORSA
Journal on Computing, 5(4):374-385.
Wong, F. S. (1991). Time Series Forecasting Using
Backpropagation Neural Network. Neurocomputing.
2:147-159.
Zhang, G. P. (2003). Time Series Forecasting Using a
Hybrid ARIMA and Neural Network Model.
Neurocomputing. 50: 159-175.
Zou, H. F., Xia, G. P., Yang, F. T. and Wang, H. Y. ( 2007).
An Investigation and Comparison of Artificial
Neural Network and Time Series Models for
Chinese
Food
Grain
Price
Forecasting.
Neurocomputing. 70: 2913-2923.
3
A PERFORMANCE STUDY OF
ENHANCED BP ALGORITHMS ON
AIRCRAFT IMAGE
CLASSIFICATION
Puteh Saad
Nursafawati Mahshos
Subariah Ibrahim
Rusni Darius
3.1
INTRODUCTION
BP is by far the most widely used algorithm to train MLPs
for pattern recognition and other similar tasks. However it
is stigmatized with the problems of low convergence,
instability and overfitting. In addition, the optimal values
of the learning rate, momentum, the number of hidden
layers and its dimension are obtained through trial and error
method. In this work, we evaluate eleven (11) enhanced
BP algorithms in classifying aircraft images. The image is
represented using a set of Zernike Moment Invariants.
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
3.2
37
BP ISSUES.
Low convergence in BP algorithm means that it
takes longer time for the network to learn (finding the
suitable weights), hence solving classification problems of
high-dimensional data is intractable hence the problem is
classified as NP-complete (Looney, 1997). The low
convergence rate and instability are attributed to the
following reasons:
a) The BP is a first-order approximation of the steepestdescent technique, which is not the best type of the
steepest descent method for finding the minimum of the
mean square error (Czap, 2001; Luh and Zhang,1999).
b) Another peculiarity of the error surface that impacts the
performance of the BP algorithm is the presence of
local minima (i.e. isolated valleys) in addition to global
minimum. Since BP is basically a hill-climbing
technique, it runs the risk of being trapped in a local
minimum, where every small change in synaptic
weights increases the error function. But somewhere
else in the weight space there exist another set of
synaptic weights for which the error function is smaller
than local minimum in which the network is stuck.
Clearly, it is undesirable to have the learning process
terminating at a local minimum, especially if it is
located far above a global minimum (Ahmed et al.,
2001; Ng and Leung, 2001).
c) The BP algorithm depends on the gradient of the
instantaneous error surface in the weight space. The
algorithm is therefore stochastic in nature; that is, it has
a tendency to zigzag its way about true direction to a
minimum on the error surface. Indeed, BP learning is
an application of a statistical method known as
38
Advances in Artificial Intelligence Applications
stochastic approximation and there
fundamental causes for this property:
exists
two
i. The error surface is fairly flat along the weight
dimension, which means that the derivative of the
error surface with respect to that weight is small in
magnitude. In such a situation, the adjustment
applied to the weight is small and consequently
many iterations of the algorithm may be required to
produce a significant reduction in error performance
of the network. Alternatively the error surface is
highly curved along a weight dimension, in which
case the derivative of the error surface with respect
to that weight is large in magnitude. In this second
situation, the adjustment applied to the weight is
large, which may cause the algorithm to overshoot
the minimum of the error surface (Wen et al., 2000).
ii. The direction of the negative gradient vector may
point away from the minimum of the error surface,
hence the adjustments applied to the weights may
induce the algorithm to move in the wrong direction.
Consequently, the rate of convergence in BP
training tends to be relatively slow (Sidani and
Sidani, 1994).
Overfitting occurs by which the error produced
during training is small, however when new data is
presented to the network the error is large. The phenomena
indicate that the network has memorized the training
samples but has not learned to generalize to new situation
(Patterson, 1996). The phenomena are due to the types of
error function used and the presence of too many neurons
in the hidden layer. The improvements made to cater the
issues are described next
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
3.3
39
BP IMPROVEMENTS
The above issues created a fertile land for innovative
research ideas to improve the standard BP, resulting in a
plethora of publications being produced.
Among
improvements made on different aspects of the BP are,
weight adjustment in the hidden layer, adaptive
determination of the learning rate, improvement of the
gradient descent method and determination of the
momentum.
The improvements are performed using heuristic
techniques and numerical optimization techniques.
3.3.1
Heuristic Techniques
Heuristic techniques are developed from an analysis of the
performance of the standard steepest descent algorithm.
One heuristic technique is the introduction of momentum
(Netnevitsky, 2002). Other heuristic techniques are;
variable learning rate BP and resilient BP (Rprop).
The values of the learning rate can be preset or
determined adaptively. Conventionally the rate is pre-set to
appropriate values and it remains constant throughout
training. The performance of the algorithm is sensitive to
the proper setting of the learning rate. If the learning rate is
set too high, the algorithm may oscillate and become
unstable. If it is set too small, the algorithm will take
longer time to converge. It is not practical to determine the
optimal setting for the learning rate before training and in
fact the optimal learning rate changes during the training
process as the algorithm moves across the performance
surface. Sharda and Patil (1992) suggested three learning
40
Advances in Artificial Intelligence Applications
rates (0.1, 0.5, 0.9) and three momentum values (0.1, 0.5,
0.9) (Zhang et al., 1998).
The performance of the standard BP can be
improved by allowing the learning rate to change during
the training process. In the variable learning rate Jacobs
(1988) introduced two heuristics. First heuristic indicate
that if the change of the sum of squared errors has the same
algebraic sign for several consequent epochs, the learning
rate parameter should be increased. Second heuristic stated
that if the algebraic sign of the change of the sum of
squared errors alternates for several consequent epochs,
then the learning rate parameter should be decreased
(Negnevitsky, 2002). An adaptive learning rate will
attempt to ensure that the step size as large as possible
while keeping the learning stable. The learning rate is
made responsive to the complexity of the local error
surface. An adaptive learning rate requires some changes
in the standard BP algorithm. First, the network outputs
and errors are computed from the initial learning rate
parameter. If the sum of squared errors at the current epoch
exceeds the previous values by more than a predefined ratio
(typically 1.04), the learning rate parameter is decreased
(typically by multiplying by 0.7) and new weights and bias
are calculated. However, if the error is less than the
previous one, the learning rate is increased (typically by
multiplying by 1.05) (Negnevitsky, 2002).
The sigmoid activation functions are characterized
by the fact that their slope must approach zero as the input
gets larger. This raises a problem when using the gradient
descent for training the MLP based on sigmoid function,
since the gradient can have a very small magnitude; and
therefore, causes small changes in the weights and biases,
even though the weights and biases are far from their
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
41
optimal values (Haykins, 1999). The resilient BP (Rprop)
eliminates the harmful effects of the magnitude of the
partial derivatives by using only the sign of the derivative
to determine the weight update direction. The magnitude
of the derivative has no effect on the weight update. The
size of the weight change is determined by a separate
update value.
i. Whenever the derivative of the error function with
respect to each weight has the same sign for two
successive iterations, the updated value for the weight
and bias is increased by a certain factor.
ii. Whenever the derivative of the error function with
respect to each weight changes sign from the previous
iteration, the updated value of the weights and bias is
decreased by a certain factor.
iii. If the derivative is zero, then the update value remains
the same.
iv. If the weights are oscillating the weight change will be
reduced.
v. If the weight change continues to change in the same
direction for several iterations, then the magnitude of
the weight change will be increased.
3.3.1
Numerical Optimization Techniques
The problem of finding suitable weights that enable to
MLP to generalize is an optimization problem (Stegeman
and Buenfield, 1999).
The significant numerical
optimization algorithms suggested to overcome the
weaknesses of the standard BP are divided into two
category; conjugate gradients algorithms and quasi-Newton
Algorithms.
42
Advances in Artificial Intelligence Applications
In the standard BP, the error function decreases
most rapidly along the negative of the gradient however
fastest convergence is not guaranteed. Hence in conjugate
gradient algorithm a search for a step size that minimizes
the error function is performed along conjugate directions.
The step size is updated in every iterations. Versions of the
conjugate gradient technique are Fletcher-Reeves Update,
Polak-Ribiere Update and Powell-Beale Restarts. The
various versions of conjugate gradient are distinguished by
the manner in which the ratio of the norm squared of the
current gradient to the norm squared of the previous
gradient. The conjugate gradient algorithms are normally
much faster than variable rate BP, however the results are
problem dependent. They are often a good choice for
networks with a large number of weights (Charalambous,
1992).
The Newton’s method compute the Hessian Matrix
(second derivative) for feedforward MLP but it is complex
and expensive. However Quasi-Newton (or secant) method
does not require the computation of Hessian matrix. An
approximate Hessian matrix is computed as a function of
the gradient. BFGS is an example of a successful quasiNewton method. BFGS is named after its founder Broyden
Flecther Goldfarb and Shannon. Although the BFGS
algorithm converges faster as compared to the conjugate
gradient method, it requires more computation and more
storage. The approximate Hessian matrix of dimension n x
n must be stored, where n is equal to the number of weights
and bias, hence its storage requirement is high (Ribert,
1999).
One step Secant algorithm is proposed to overcome
the weakness of the BFGS algorithm by not storing the
complete Hessian matrix. It assumes that during each
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
43
iteration, the previous Hessian was the identity matrix, thus
the new search direction can be calculated without
computing a matrix inverse. The algorithm is considered to
be a compromise between conjugate gradient and quasiNewton method. It requires less storage and computation
per iteration than the BFGS algorithm but slightly more
storage and computation per epoch than the conjugate
gradient algorithm.
Levenberg-Marquart algorithm however estimates
the Hessian matrix from the first derivates computed
through a standard BP. Jacobian matrix is utilized to store
the first derivatives. The algorithm appears to be the fastest
method for training the moderate-sized MLP (up to several
hundred weights) however it suffers the disadvantage of
high storage requirement, since it need to store the Jacobian
matrix. The size of the Jacobian matrix is Q x n, where Q
is the number of training sets and n is the number of
weights and biases in the network. In order to avoid the
high storage requirement, Reduced Memory LevenbergMarquart algorithm is suggested. The algorithm split the
Jacobian matrix into two submatrices as follows:
[
]
⎡J ⎤
H = J T J = J1T J 2T ⎢ 1 ⎥ = J1T J1 + J 2T J 2
⎣ J2 ⎦
The approximate Hessian can be calculated by summing a
series of subterms. Once one subterm has been computed,
the corresponding submatrix of the Jacobian can be
eliminated. Hence the full Jacobian does not have to exist
at one time (Marquardt, 1963; Hagan and Menhaj,1994).
Tables 3.1 to 3.3 summarize the series of algorithms that
are explored in this work.
44
Advances in Artificial Intelligence Applications
Table 3.1
Enhanced BP Algorithms
Acronym Algorithm
BFG
BFGS QuasiNewton
Function
This algorithm requires more
computation in each iteration
and more storage than the
conjugate gradient methods,
although
it
generally
converges in less iteration.
CGB
Conjugate
Gradient with
Powell/Beale
The traincgb routine has
performance that is somewhat
better than traincgp for some
problems, although
performance on any given
problem is difficult to predict.
CGF
FletcherPowell
Conjugate
Gradient
It updates weight and bias
values according to the
conjugate gradient
backpropagation with
Fletcher-Reeves updates.
usually much faster than
variable learning rate
backpropagation, and are
sometimes faster than trainrp,
although the results will vary
from one problem to another.
CGP
Polak-Ribiére
Conjugate
Gradient
It updates weight and bias
values according to the
conjugate
gradient
backpropagation with PolakRibiere updates.
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
Table 3.2
Enhanced BP Algorithms (continue)
Acronym
Algorithm
SCG
Scaled
Conjugate
Gradient
GD
GDM
GDX
45
Function
It requires more iteration to
converge than the other
conjugate gradient
algorithms, but the number
of computations in each
iteration is significantly
reduced because no line
search is performed.
Basic gradient
A network training function
descent.
that updates weight and bias
values according to gradient
descent with momentum.
Gradient descent It is a batch algorithm for
feedforward networks that
with
often provides faster
momentum.
convergence. Momentum
allows a network to respond
not only to the local
gradient and allows the
network to ignore small
features in the error surface.
Without momentum a
network may get stuck in a
shallow local minimum.
Variable
Learning Rate
Back propagation
The
function
traingdx
combines adaptive learning
rate with momentum
Training and but can only
be used in batch mode
training.
46
Advances in Artificial Intelligence Applications
Table 3.3
Enhanced BP Algorithms (continue)
Acronym
LM
Algorithm
LevenbergMarquardt
Function
It updates weight and bias
values according to LevenbergMarquardt optimization. The
storage requirements of trainlm
are larger than the other
algorithms tested.
OSS
One-Step
Secant
This algorithm requires less
storage and computation per
epoch than the BFGS
algorithm. It requires slightly
more storage and computation
per epoch than the conjugate
gradient algorithms. It can be
considered a compromise
between full quasi-Newton
algorithms and conjugate
gradient algorithms.
RP
Resilient
Backpropagation
It is generally much faster than
the standard steepest descent
algorithm. It also has the nice
property that it requires only a
modest increase in memory
requirements. We do need to
store the update values for each
weight and bias, which is
equivalent to storage of the
gradient.
47
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
3.4
FEATURE EXTRACTION USING ZERNIKE
MOMENT INVARIANT
Zernike Moment (ZM) is chosen since it is invariant to
rotation and insensitive to noise. Another advantage of ZM
is the ease of image reconstruction because of its
orthogonality property (Teague, 1980). ZM is the
projection of the image function onto orthogonal basis
functions. ZM also has a useful rotation invariance property
where the magnitude of ZM will not change for a rotated
image. Another main property of ZM is the ease of image
reconstruction because of its orthogonality property.
The major drawback of ZM is it’s computational
complexity (Mukundan, 1998). This is due to the recursive
computation of Radial Polynomials. However in this study,
we overcome the problem of computational complexity by
adopting a non recursive computation of Radial
Polynomials. The computation is based on the relationship
between Geometric Moment Invariant and ZM in order to
derive Zernike Invariant Moment.
The ZM of order p with repetition q for a continuos
image function f(x,y) that vanishes outside the unit circle is
as shown in Equation. 1.
Z pq =
p +1
π
∫∫
x 2 + y =1
f (r ,θ )V * pq rdrdθ
(1)
To compute a ZM of a given image, the center of the image
is taken as the origin and pixel coordinates are mapped to
the range of unit circle, i.e. x2 + y2 = 1. The functions of Vpq
(r, θ) denote Zernike Polynomials of order p with repetition
q, and * denotes a complex conjugate where the Zernike
48
Advances in Artificial Intelligence Applications
Polynomials are defined as functions of the polar
coordinates r, θ . Equations relating Zernike and Geometric
Moment up to third order are given below
Thus |Zpq| , the magnitude of the ZM, can be taken
as a rotation invariant feature of the underlying image
function. Rotation invariants and their corresponding
expressions in Geometric Moment are given below until the
order of three:
Zoo = (1/π) Moo
|Z11|2 = (2/π)2 ( M10 2+ M012)
Z20 = (3/π) [2M20 + M02 ) – M00]
2
2
2
(2)
2
|Z22| = (3/π) [(M20 - M02) + 4 M11 )
|Z31| 2= (12/π)2 [(M30 + M12)2 + (M03+ M21)2]
|Z33| 2= (4/π)2 [(M30 – 3M12)2 + (M03 – 3M21)2]
3.5
IMPLEMENTATION, RESULTS AND
DISCUSSION
The coloured aircraft images that are downloaded from
internet (www.airliners.net) are converted into gray-level
format. Noise is then removed from the image and
subsequently it is thresholded using Otsu thresholding
algorithm. The aircraft image samples are grouped into
49
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
three categories based on their types. Category 1 represents
commercial aircraft. Category 2 represents cargo and
finally Category 3 represents military. In order to evaluate
the intra-class invariance and the robustness of the feature
extraction techniques adopted, each aircraft image
perturbed by scaling and rotation factors to generate 12
variations of the same image. An example image with
different variations is depicted in Figures 3.1 and 3.2. Table
3.4 provides the description of each image. Basically, there
are 4 different of scaling factors chosen namely 0.5, 0.75,
1.3 and 1.5. For rotation factors, angles of 5o, 15o, 45o and
90o was chosen. While four images for each aircraft are
perturbed by both factors (0.5 with 5o, 0.75 with 15o, 1.3
with 45o and 1.5 with 90o). Each category of aircraft has 10
different type of model. Hence each category consists of 10
original images and 120 perturbed images thus making a
total of 390 images.
(i)
(ii)
(iii)
(iv)
(v)
(vi)
Figure 3.1
An Aircraft Image and its variations
50
Advances in Artificial Intelligence Applications
(vii)
(viii)
(ix)
(x)
(xii)
(xiii)
(xiv)
Figure 3.2
(continue
An Aircraft Image and its variations
Zernike Moment features are extracted from image samples
using equation (2). Each of the images has a set of six (6)
features, indicated by ϕ1…ϕ6. Due to the space constraint,
the features are displayed in two tables. Table 3.5 recorded
of set of features from ϕ1 to ϕ3. Table 3.6 lists down the set
of features from ϕ4 to ϕ6.. It is observed that ZM orders 1
and 2 have null values and order 3 is significant. Thus only
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
51
ϕ 3 to ϕ 6 are utilized to be classified by enhanced BP
algorithms.
Table 3.4
Image
(i)
(ii)
(iii)
(iv)
(v)
(vi)
(vii)
(viii)
(ix)
(x)
(xi)
(xii)
(xiii)
Type of variations of aircraft images
Variation
Original image
the image is reduced to half of its original size
the image is reduced to 0.7 of its original size
the image is enlarged 1.3 its original size
the image is enlarged 1.3 its original size
the image is rotated to 5o
the image is rotated to 15o
the image is rotated to 45o
the image is rotated to 90o
the image is reduced to 0.5 and rotated to 5o
the image is reduced to 0.75 and rotated to 15o
the image is enlarged 1.3 and rotated to 45o
the image is enlarged 1.5 and rotated to 90o
A k-fold cross validation technique is chosen to validate the
classification results. Here image samples are divided into
k subset. Then, cross validation process repeat k times to
make sure all the subsets were trained and tested. The
number of correct classification is computed using equation
(3). n is refers to the number data test. If testing vector is
true, σ(x,y)t=1. However, if testing vector is wrong, then
σ(x,y)t=0. The percentage of correct classification is given
by (4).
52
PCC
Advances in Artificial Intelligence Applications
k
NCC
k
= 100
=
G
∑
n
t =1
∑
4
1
k =1
NCC
k
σ ( x, y )
(3)
(4)
t
Table 3.5
The ZM Feature Vector of Aircraft Images
from ϕ1 to ϕ3
ZMI
ϕ1
ϕ2
ϕ3
Original
0.00000
0.00000
0.496054
0.00000
0.00000
0.495574
15o
0.00000
0.00000
0.495431
45
o
0.00000
0.00000
0.497110
90
o
0.00000
0.00000
0.495590
0.5
0.00000
0.00000
0.495738
0.75
0.00000
0.00000
0.495770
1.3
0.00000
0.00000
0.495620
1.5
0.00000
0.00000
0.495642
15 x 0.75
0.00000
0.00000
0.496542
45o x 1.3
0.00000
0.00000
0.495091
0.00000
0.00000
0.495644
5
o
o
o
90 x 1.5
In order to set the architecture of BP Network, the first task
is to find the suitable number of hidden nodes. The BP is
trained several times and the results are recorded in Table
3.7. It is noticed that the desirable number of hidden nodes
is 28, since it produces the smallest MSE error.
53
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
Table 3.6
The ZM Feature Vector of Aircraft Image
from ϕ4 to ϕ6
ZMI
4
5
6
Original
0.008847
0.020535
0.009627
0.008694
0.019817
0.008370
15o
0.008739
0.020038
0.009627
45
o
0.009031
0.020004
0.007078
90
o
0.008648
0.020367
0.006955
0.5
0.008825
0.020038
0.009411
0.75
0.008757
0.019979
0.009484
1.3
0.008701
0.020155
0.009549
1.5
0.008691
0.020191
0.009568
15 x 0.75
0.008945
0.019826
0.005711
45o x 1.3
0.008708
0.019019
0.006961
0.008691
0.020185
0.006881
5
o
o
o
90 x 1.5
Table 3.7
Number of Hidden Nodes
Neuron
Epoch
MSE Time (second)
7
10000
0.0296929
372.22
82
14
60
0.0198156
6.11
83
21
62
0.0192405
8.97
79
28
33
0.0180998
8.42
80
35
24
0.0196875
9.47
78
Legend: NCC – no of correct classification
NCC
54
Advances in Artificial Intelligence Applications
An experiment is further conducted on all the enhanced BP
algorithms studied to search for a suitable epoch that can
produce the smallest MSE error. Table 3.8 records the
findings for 30,000 epochs.
Table 3.8
Algorithm
BFG
LM
RP
SCG
CGB
CGF
CGP
OSS
GD
GDM
GDX
MSE during BP Training at 30,000 epochs
Time
(second)
143.50
13.22
43.50
765.58
313.84
617.30
950.11
801.38
520.00
625.78
742.41
T
Algorithm
BFG, LM, RP and CGB
SCG, CGF, CGP, OSS
GD, GDM, GDX
Epoch
MSE
2958
58
4175
30000
12020
26787
30000
30000
30000
30000
30000
0.00998265
0.00969848
0.00999991
0.01253640
0.00999982
0.01397950
0.01631220
0.01059640
0.05695580
0.05435680
0.03685260
MSE
MSE < 0.01
0.02<MSE<0.01
MSE <0.06
55
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
It is observed that only four algorithms (BFG, LM, RP and
CGB) have MSE less than 0.01. Another four algorithms
(SCG, CGF, CGP, OSS) produce MSE below 0.02 and
higher than 0.01. MSE for Algorithms (GD, GDM, GDX) is
0.06. In order to decrease the MSE error further, maximum
epoch is increased to 50,000. The findings are tabulated in
Table 3.9. Thus MSE value is fixed to 0.02 and epoch is
50,000. Finally the necessary parameters to train the BP
Network for all algorithms are listed in Table 3.10.
Table 3.9
Algorithm
MSE to train BP at 50,000 epochs
Time
(Second)
Epoch
MSE
BFG
67.72
1324
0.01999730
LM
7.70
33
0.01809980
RP
16.80
1497
0.01999640
SCG
120.86
5598
0.01999980
CGB
82.77
2876
0.01999930
CGF
209.25
8640
0.02000000
CGP
121.47
4985
0.01999980
OSS
181.27
8032
0.01999510
GD
737.64
50000
0.04936950
GDM
634.44
50000
0.04779880
GDX
665.75
50000
0.03241720
56
Advances in Artificial Intelligence Applications
Table 3.10
Algorithm.
Parameters to train the Enhanced BP
Parameter
lr
mc
Target Error
Number of Hidden Nodes
Epoch
Value
0.2
0.9
0.02
28
50000
Based on the listed parameters in Table 3.10, BP Network
is trained and tested using all the enhanced BP algorithms.
The cross validation results are recorded in Tables 3.11 to
3.13
Table 3.11
Algo
k
Epoch
Classification Results
MSE
Time NCC
PCC
(Sec)
BFG
LM
1
1324
0.0199973
67.72
85 77.95
2
925
0.0199992
60.5
81
3
762
0.0199998
66.2
71
4
619
0.0199512
50.31
67
1
33
0.0180998
7.7
2
26
0.0194518
8.11
78
3
26
0.0193606
9.55
72
4
25
0.0196315
7.97
67
80 76.15
Av Time
(sec)
61.18
8.33
57
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
Table 3.12
Classification Results (continue)
Algo
k Epoch
RP
1
1497
0.0199964
Time NCC PCC Av Time
(Sec)
(sec)
16.8
79 80.77
25.77
2
2554
0.0199973
33.58
79
3
2132
0.0199987
37.88
80
4
810
0.0199935
14.8
77
1
5598
0.0199998
120.86
2
3786
0.0199993
90.81
82
3
4175
0.0199924
134.69
74
4
1265
0.0199955
37.66
68
1
2876
0.0199993
82.77
84 79.23
2
3724
0.0199920
134.75
81
3
4001
0.0199982
191.22
75
4
1494
0.0199930
60.34
69
1
8640
0.0200000
209.25
84 78.46
2
5244
0.0199995
167.13
83
3
4774
0.0199999
208.39
75
4
1620
0.0199962
68.97
64
1
4985
0.0199998
121.47
84 78.46
2
5285
0.0199986
166.72
79
3
4522
0.0199867
199.64
75
4
1536
0.0199929
62.56
68
SCG
CGB
CGF
CGP
MSE
81 78.21
96.01
117.27
163.44
137.60
58
Advances in Artificial Intelligence Applications
Table 3.13
Classification Results (continue)
Algo
k Epoch
OSS
1
8032
0.0199951
Time NCC PCC Av Time
(Sec)
(sec)
181.27
79 76.67 274.50
2
9948
0.0199848
268.61
78
3
11093
0.0199705
508.42
73
4
4311
0.0199353
139.7
69
1
50000
0.0493695
737.64
53 54.87
2
50000
0.0494339
758.11
55
3
50000
0.0500599 1010.42
56
4
50000
0.0449906
885.84
50
GDM 1
50000
0.0477988
634.44
53 53.33
2
50000
0.0522435
685.25
56
3
50000
0.0496276 1061.97
52
4
50000
0.0464307
872.53
47
1
50000
0.0324172
665.75
61 66.92
2
50000
0.0316615
733.31
72
3
50000
0.0324195
796.33
66
4
50000
0.0281532
710.66
62
GD
GDX
MSE
848.00
813.55
726.51
The results are analyzed from two aspects, percentage
classification (see Figure 3.3) and time taken to reach the
target output (see Figure 3.4). In terms of percentage
classification RP wins by having the highest rate of
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
59
classification but it loses to LM in terms of computation
rate. The finding is in accordance to the theory that claim
LM converges faster since it estimates the Hessian matrix
from the first derivatives without having to store the
Jacobian matrix. The algorithm split the Jacobian matrix
into two sub-matrices.
Percentage Classification of Enhanced BP
Figure 3.3
Algorithms
Time Vs Algorithm
Time(sec)
1000
800
600
400
200
G
D
G
D
M
G
D
X
SC
G
CG
B
CG
F
CG
P
O
SS
RP
LM
BF
G
0
Algorithm
Figure 3.4
Computation Time against Algorithm
60
Advances in Artificial Intelligence Applications
The approximate Hessian can be calculated by
summing a series of sub-terms. Once one sub-term has
been computed, the corresponding sub-matrix of the
Jacobian can be eliminated. Another observation is on the
performance of BFG, SCG, CGB, CGF, CGP and OSS, the
percentage classification is less by five percent to RP.
However algorithm GD, GDM and GDX display a poor
performance due to the heuristic nature of gradient descent.
Hence it is found that among all the enhanced BP
algorithms examined, RP (Resilient Backpropagation)
achieves the highest classification rate. LM wins in terms of
computation time.
3.6
REFERENCE
Ahmed, W. A. M., Saad, E. S. M. and Aziz, E. S. A. (2001).
Modified back propagation algorithm for learning
artificial neural networks. Proceedings of the
Eighteenth National Radio Science Conference
NRSC 2001. 1: 345 – 352.
Airc. Aviation Photos. (accesses 10 Ogos 2007). www.airliners.net.
Charalambous, C. (1992). Conjugate gradient algorithm for
efficient training of artificial neural networks.
Proceedings of the IEEE. 139 (3), 301-310.
Czap, H. (2001). Construction and interpretation of multilayer-perceptrons. IEEE International. Conference
on Systems, Man, and Cybernetics. 5, 3349 - 3354.
Hagan, M. T. and Menhaj, M. B. (1994). Training
feedforward networks with the Marquardt algorithm.
IEEE Transactions on Neural Networks. 5(6). 989 –
993.
Haykin, S. (1994). Neural Networks – A Comprehensive
Foundation. Toronto, Canada: Maxwell Macmillan.
A Performance Study of Enhanced BP Algorithms on
Aircraft Image Classification
61
Looney, C. G. (1997). Pattern Recognition Using Neural
Networks: Theory and Algorithms for Engineers and
Scientists. New York Oxford : Oxford University
Press.
Luh, P. B. and Li Zhang. (1999). A novel neural learning
algorithm for multilayer perceptrons. International
Joint Conference on Neural Networks IJCNN '99. 3,
1696 – 1701.
Marquardt, D. (1963). An algorithm for least squares
estimation of non-linear parameters. Journal Society
Industrial Applied Mathematics: 431-441.
Mukundan, R., and Ramakrishnan, K. R. (1998). Moment
Function In Image Analysis. Farrer Road, Singapore:
World Scientific Publishing.
Negnevitsky, M. (2005). Artificial Intelligence A Guide to
Intelligent Systems. 2nd ed. Harlow: AddisonWesley.
Ng, S. C. and Leung, S. H. (2001). On solving the local
minima problem of adaptive learning by using
deterministic
weight
evolution
algorithm.
Proceedings of the 2001 Congress on Evolutionary
Computation. 1, 251 - 255
Patterson, D. W. (1996). Artificial Neural Network Theory
and Application. Singapore : Prentice-Hall.
Ribert, A., Stocker, E., Lecourtier, Y. and Ennaji, A. (1999).
A survey on supervised learning by evolving multilayer
perceptrons.
Proceedings
of
Third
International Conference on Computational
Intelligence and Multimedia Applications, ICCIMA
'99, 122 – 126.
Sidani, A. and Sidani, T. (1994). A comprehensive study of
the backpropagation algorithm and modifications.
Southcon/94. Conference Record, 80 – 84.
62
Advances in Artificial Intelligence Applications
Teague, M. R. (1980). Image Analysis Via The General
Theory Of Moments. Journal of the Optical Society
of America. 70 (8) : 920-930.
Wen, J., Zhao, J. L., Luo, S. W. and Han, Z. (2000). The
improvements of BP neural network learning
algorithm. 5th International Conference on Signal
Processing Proceedings WCCC-ICSP 2000. 3,
1647 – 1649.
Zhang, G., Patuwo, B. D. and Hu, M. Y. (1998).
Forecasting with artificial neural networks : The
state of the art. International Journal of Forecasting.
14: 35 - 62
4
ANFIS FOR RICE YIELDS
FORECASTING
Ruhaidah Samsudin
Puteh Saad
Ani Shabri
4.1
INTRODUCTION
Almost 90% of rice is produced and consumed in Asia, and
96% in developing countries. In Malaysia, the Third
Agriculture Policy (1998-2010) was established to meet at
least 70% of Malaysia’s demand a 5% increase over the
targeted 65%. The remaining 30% comes from imported
rice mainly from Thailand, Vietnam and China (Saad et al.,
2006). Raising level of national rice self-sufficiency has
become a strategic issue in the agricultural ministry of
Malaysia. The ability to forecast the future enables the farm
managers to take the most appropriate decision in
anticipation of that future.
The accuracy of time series forecasting is
fundamental to many decisions processes (Zou et al., 2007).
One of the most important and widely used time series
model is artificial neural network (ANN). ANN is being
used more frequently in the analysis of time series
forecasting, pattern classification and pattern recognition
64
Advances in Artificial Intelligence Applications
capabilities (Ho et al., 2007). ANN provides an attractive
alternative tool for both forecasting researchers and has
shown their nonlinear modeling capability in data time
series forecasting.
Another approach is using fuzzy logic. Fuzzy Logic
first developed to explain the human thinking and decision
system by Zadeh (Sen et al., 2006). Several studies have
been carried out using fuzzy logic in hydrology and water
resources planning (Chang et al., 2001; Liong et al., 2000;
Mahabir et al., 2000; Ozelkan and Duckstein, 2001; Sen et
al., 2006).
Recently, an adaptive neuro-fuzzy inference system
(ANFIS), which consists of the ANN and fuzzy logic
methods, has been used for several application, such as
database management, system design and planning/
forecasting of water resources (Chang et al., 2006; Chang
et al., 2001; Chen et al., 2006; Firat and G¨ung¨or , 2007;
Firat, 2007; Nayak et al., 2008).
The main purpose of this study is to investigate the
applicability and capability of the ANFIS and ANN for
modeling of rice yields time-series forecasting. To verify
the application of this approach, the rice yields data from
27 stations in Peninsular Malaysia is chosen as the case
study.
4.2
ARTIFICIAL NEURAL NETWORK (ANN)
Recently, ANN has been extensively studied and used in
time series forecasting. Zhang presented a recent review in
this area (Zhang et al., 2003). The major advantage of
65
ANFIS for Rice Yields Forecasting
ANN is their ability to model complex nonlinear
relationship without a priori assumptions of the nature of
the relationship. The ANN with single hidden layer
feedforward network is the most widely used model for
modeling and forecasting. The model is characterized by a
network of three layers of simple processing units
connected by a cycle links. The relationship between the
input observations ( y t −1 , y t − 2 , ..., y t − p ) and the output
value ( y t ) has following:
y t = a 0 + ∑ j =1 a j f ( w0 j + ∑i =1 wij y t −i ) + ε t
q
p
(1)
where a j (j = 0, 1, 2, …, q) is a bias on the jth unit, and
wij (i = 0, 1, 2, …, p; j = 0, 1, 2, …, q) is the connection
weights between layers of the model, f(•) is the transfer
function of the hidden layer, p is the number of input nodes
and q is the number of hidden nodes.
Training a network is an essential factor for the
success of the neural networks. Among the several learning
algorithms available, back-propagation has been the most
popular and most widely implemented learning algorithm
for all neural network paradigms (Zou et al., 2007). In this
paper algorithm of back-propagation is used in the
following experiment.
Actually, the ANN model in (3) performs a
nonlinear functional mapping from the past observation
( y t −1 , y t − 2 , ..., y t − p ) to the future value ( y t ) , i.e.,
y t = f ( y t −1 , y t − 2 , ..., y t − p , w) + ε t
(2)
66
Advances in Artificial Intelligence Applications
where w is a vector of all parameters and f is a function
determined by the network structure and connection
weights. Thus, in some senses, the ANN model is
equivalent to a nonlinear autoregressive (NAR) model. A
major advantage of neural networks is their ability to
provide flexible nonlinear mapping between inputs and
outputs. They can capture the nonlinear characteristics of
time series well.
4.3
THE ADAPTIVE NEURAL FUZZY
INFERENCE SYSTEM (ANFIS)
The ANFIS is a multilayer feed-forward network consisting
of the nodes and directional links, which combines the
learning capabilities of a neural network and reasoning
capabilities of fuzzy logic. Since Zadeh proposed the fuzzy
logic approach to describe complicated systems, it has
become popular and has been used successfully in various
engineering problems (Liong et al., 2000; Chang et al.,
2001; Sen and Altunkaynak,2006 ; Nayak et al.,2004;
Chang and Chang, 2006; Chen et al., 2006; Firat and
Gungor, 2007). ANFIS has been used by many researchers
to organize the network structure itself and to adapt the
parameters of the fuzzy system for many engineering
problems, such as the modeling of agricultural time-series.
The fuzzy inference system is a rule-based system
consisting of three conceptual components. These are: (1) a
rule base, containing fuzzy if-then rules, (2) a database,
defining the membership function and (3) an inference
system, combining the fuzzy rules and producing the
system results (Sen and Altunkaynak, 2006). The first
phase of fuzzy logic modeling is determination of the
membership functions of the input-output variables, the
ANFIS for Rice Yields Forecasting
67
second phase is the construction of fuzzy rules, and the last
phase is the determination of the output characteristics,
output membership function and system results (Firat and
Gungor , 2007; Murat, 2006).
Figure 4.1
(a) Sugeno’s fuzzy if then rule and fuzzy
reasoning mechanism; (b) equivalent ANFIS architecture.
Two methods, called the back-propagation algorithm and
the hybrid-learning algorithm, provide the learning of the
ANFIS and construction of the rules and are used to
determine the membership function of the input-output
variables. A general structure of fuzzy system is
demonstrated in Figure 4.1.
ANFIS has been shown to be powerful in modeling
numerous processes, such as rainfall-runoff modeling and
real-time reservoir operation (Chang and Chang, 2006;
Furat and Gungor, 2007; Chen et al., 2006). ANFIS uses
the learning ability of the ANN to define the input-output
68
Advances in Artificial Intelligence Applications
relationship and construct the fuzzy rules by determining
the input structure. The system results were obtained by the
thinking and reasoning capability of the fuzzy logic. The
hybrid-learning algorithm and subtractive function are used
to determine the input structure. The detailed algorithm and
mathematical background of the hybrid-learning algorithm
can be found in (Jang et al., 1997). There are two types of
fuzzy inference system in the literature: the Sugeno-Takagi
inference system and the Mamdani inference system. In
this study, the Sugeno-Takagi inference system is used for
modeling of agricultural time-series. The most important
difference between these systems is the definition of the
consequence parameter. The consequence parameter in the
Sugeno inference system is a linear equation, called a ‘firstorder Sugeno inference system’, or a constant coefficient,
called a ‘zero-order Sugeno inference system’. It is
assumed that the fuzzy inference system includes two
inputs, x and y, and one output, z. For the first-order
Sugeno inference system, typical two rules can be
expressed as
Rule 1: IF x is A 1 and y is B 1 THEN f 1 = p 1 x + q 1 y + r 1
Rule 2: IF x is A 2 and y is B 2 THEN f 2 = p 2 x + q 2 y + r 2
Where x and y are the crisp inputs to the node i, A i and
B i are are the linguistic labels such as low, medium, high,
etc., which are characterized by convenient membership
functions, pi , qi , and ri are the consequence parameters.
The structure of this fuzzy inference system is shown in
Figure 4.1.
69
ANFIS for Rice Yields Forecasting
It consists of five layers as described below.
Input nodes (layer 1). Each node in this layer generates
membership grades of the crisp inputs which belong to
each of convenient fuzzy sets by using the membership
functions. Each nodes’s output O 1i is calculated by
Oi1 = μ Ai ( x) for i = 1,2;
Oi1 = μ Bi−2 ( y )
for i = 3,4
(3)
where μ Ai and μ Bi−2 are the membership functions for the
A i and B i fuzzy sets respectively. Various membership
functions, such as trapezoidal, triangular, Gaussian
generalized bell membership function, etc., can be applied
to determine the membership grades. If generalized bell
membership function is used, the membership function is
given by
Oi1 = μ Ai ( x) =
1
⎡ ( x − ci ) 2 ⎤
1+ ⎢
⎥
2
⎢⎣ ai
⎥⎦
where ai , bi and ci are the parameter set.
bi
(4)
Rule nodes (layer 2). In this layer, the AND/OR operator is
applied to get one output that represents the results of the
antecedent for a fuzzy rule, i.e. firing strength. The outputs
of the second layer, called firing strengths Oi1 , are the
products of the corresponding degrees obtained from layer
1, termed w, as follows:
Oi2 = wi = μ Ai ( x) μ Bi ( y )
i =1,2
(5)
70
Advances in Artificial Intelligence Applications
Average nodes (layer 3). The main target is to compute the
ratio of firing strength of each ith rule to the sum of the
firing strengths of all rules. The firing strength in this layer
is normalized as
Oi3 = wi =
wi
∑ wi
i =1,2
(6)
i
Consequent nodes (layer 4). The contribution of the ith rule
towards the total output or the model output and/or the
function defined is calculated by
Oi4 = wi f i = wi ( pi x + qi y + ri )
i = 1, 2
(7)
where wi is the ith node output from the previous layer as
demonstrated in the third layer. {pi , qi , ri } is the parameter
set in the consequence function and also the coefficients of
linear combination in the Sugeno inference system.
Output nodes (layer 5). This layer is termed the output
node, in which a single node computes the overall output
by summing al incoming signals and it is the last step of the
ANFIS. The output of the system is calculated as
f ( x, y ) =
=
w1 ( x, y ) f 1 ( x, y ) + w2 ( x, y ) f 2 ( x, y )
w1 ( x, y ) + w2 ( x, y )
w1 f 1 + w2 f 2
w1 + w2
Oi5 = f ( x, y ) = ∑ w. f i
i
(8)
71
ANFIS for Rice Yields Forecasting
= wi f1 + wi f 2
∑w f
=
∑w
i
i
i
(9)
i
i
Similar to ANNs, an ANFIS network can be trained based
on supervised learning to reach from a particular input to a
specific target output. ANFIS applies the hybrid-learning
algorithm to achieve this aim, which consists of a
combination of the ‘gradient descent’ and the ‘leastsquares’ methods. The gradient descent method is used to
assign the non-linear input parameters, and the leastsquares method is employed to identify the linear output
parameters ( pi , qi , ri ) . The antecedent parameter, i.e. MF
given in layer 2, is applied to construct the rules of the
ANFIS model. Since the input variables within a range
might be clustered into several classes, the structure of the
input layer needs to be determined accurately. The
‘subtractive fuzzy clustering’ function, which offers an
effective result using less rules, is applied to solve the
problem in the ANFIS modeling (Nayak et al., 2004).
4.4
ANN AND ANFIS
The ANN and ANFIS models for rice yield forecasting
were developed using the data acquired from Muda
Agricultural Development Authority (MUDA) Kedah,
Malaysia ranging from 1995 to 2001. There are 4 areas
with 27 locations. There are two types of season symptom
that influenced the rice yield in Malaysia.
The rice yield series data is used in this study to
demonstrate the effectiveness of the hybrid method. These
72
Advances in Artificial Intelligence Applications
time series come from different location and have different
statistical characteristics. The rice yields data contains the
yields data from 1995 to 2001, giving a total of 351
observations. Given a set of 351 observations made at
uniformly spaced time intervals, the locations of rice yield
are rescaled to the time axis becomes the set of integers {1,
2, …, 432}. For example the first location in 1995 is
written as time 1, the second location in 1995 as time 2 and
so on. The time series plot is given in Figure 4.2.
Figure 4.2
Rice Yields Series (1995-2001)
In order to assess the forecasting performance of different
models, each data set is divided into two samples. The first
series was used for training the network (modeling the time
series) and the remaining were used for testing the
performance of the trained network (forecasting). We
extract the data from 1995 to 2001 producing 351
observations for training and the remainder as the output
sample data set with 27 observations for forecasting
purpose.
The performance of the each model for both the
training data and forecasting data are evaluated and is
selected according to the mean absolute error (MAE) and
root-mean-square error (RMSE), which are widely used for
ANFIS for Rice Yields Forecasting
73
evaluating results of time series forecasting. The MAE and
RMSE are defined as
MAE =
1
N
RMSE =
∑
N
t =1
1
N
y t − ŷ t
yt
∑ (y
N
t =1
t
− ŷ t )
(10)
2
(11)
where y t and ŷ t are the observed and the forecasted rice
yields at the time t. The criterions to judge for the best
model are relatively small of MAE and RMSE.
4.5
IMPLEMENTATION OF ANN
In this investigation, we only consider the situation of onestep-ahead forecasting with 27 observations. Before the
training process begins, data normalization is often
performed. The linear transformation formula to [0, 1] is
used:
xn =
y0
y max
where x n and y 0 represent the normalized and original
data; and y max represent the maximum values among the
original data. In order to confirm the neural network used
in the forecast, ACF and PACF were used to determine the
maximum number of input neurons used during the training.
Figure 4.3 presents the ACF and PACF of data sets for the
74
Advances in Artificial Intelligence Applications
rice yields time series. The input variable selection for the
ANN is selected from lags with high ACF and PACF.
Based on these analyses, the maximum number of lags, 27,
was identified suitable to use as inputs for the proposed
ANN. The one only neuron in the output layer represented
being modeled. All the data were normalized in the range 0
and 1. After the input and output variables were selected,
the ANN architecture of 27-H-1 was explored for capturing
the complex, non-linear and seasonality of rice yields data.
The network was trained for 5000 epochs using the backpropagation algorithm with a learning rate of 0.001 and a
momentum coefficient of 0.9. Table 4.1 shows the
performance of ANN during training with varying the
number of neurons in the hidden layer (H).
Table 4.1
Performance Variation of a Three-Layer
ANN during training with the number of neurons in the
hidden layer for ANN
Criterion
Number of neurons
in the hidden layer
RMSE
MAE
3
9
15
21
27
33
39
45
51
57
63
70
14093
15733
15043
14263
14107
13794
13768
12863
13199
13334
12590
12791
0.118
0.129
0.115
0.114
0.114
0.113
0.106
0.099
0.102
0.103
0.097
0.101
ANFIS for Rice Yields Forecasting
75
It is observed that the performance of ANN is improved as
the number of hidden neurons increases. However, too
many neurons in the hidden layer may cause over-fitting
problem, which results in the network can learn and
memorize the data very well, but lacks the ability to
generalize. If the number of neurons in hidden layer is not
enough then the network may not be able to learn. So, an
ANN with 63 neurons in the hidden layer seems to be
appropriate.
4.6
IMPLEMENTATION OF ANFIS
The ANFIS configuration is obtained through a trial and
error process. One of the most important steps in
developing a satisfactory forecasting ANFIS model is the
selection of the input variables among the available
variables. In this study firstly, the four models having
various inputs are trained and tested by ANFIS method and
the performances of models for rice yields are compared
and evaluated based on training and forecasting
performances. The structures of the forecasting models can
be expressed as
y (t ) 351 = f ( y (t − 1) 351 , y (t − 2) 351 ,..., y (t − k ) 351 )
where y t represents the rice yields at time t.
(12)
It worked with different numbers of membership functions:
two, three, four, and five. Various membership functions,
such as trapezoidal, sigmoid, Gaussian, generalized bell
membership function were also considered. For output set
before defuzzification process we select Linear Models,
which indicates that the generated model are Type-I
Takagi-Sugeno model.
76
Advances in Artificial Intelligence Applications
The results in terms of various performance
statistics from all the ANFIS models are presented in
Tables 4.2 and 4.3. From experiment results, using ANFIS
algorithm, we find the most of ANFIS suffer the problem
of slow convergence and almost all of them cannot reach
the target of training especially when the number of
membership more and the number of input more than 4 are
used. The final model was chosen according to the smallest
value of errors. The analysis revealed that three symmetric
Gaussian membership functions for four inputs have shown
the lowest value of the RMSE and MAE. The network that
performs best is chosen as the final model for forecasting of
27 observation rice yields.
Table 4.2
Different Structure of the ANFIS
Number of Inputs
Membership
Function
Generalized
Bell
Membership
Number
Performance
Criterion
2
3
4
MAE
0.136
0.122
0.111
RMSE
16321
16215
13839
MAE
0.132
0.118
0.286
RMSE
16066
14863
42073
MAE
16066
0.111
7.590
RMSE
1.5891
14673
8737933
MAE
0.123
0.231
-
RMSE
16031
34048
-
2
3
4
5
77
ANFIS for Rice Yields Forecasting
Table 4.3
Different Structure of the ANFIS (continue)
Number of Inputs
Membership
Function
Symmetric
Gaussian
Membership
Number
Performance
Criterion
2
3
4
MAE
0.129
0.120
0.111
RMSE
15988
15726
13782
MAE
0.128
0.136
0.101
RMSE
15954
16628
12945
MAE
0.126
0.109
310339
RMSE
15821
14252
487690670
MAE
0.123
Inf
-
RMSE
15953
15851
-
MAE
0.136
0.117
0.119
RMSE
16231
14626
15684
MAE
0.131
9.576
4703.250
RMSE
16004
13333672
9709706600
MAE
0.129
0.339
1.000
RMSE
16025
86185
101693
MAE
0.133
12312
1.000
RMSE
16910
6795058
101693
2
3
4
5
Sigmoid
2
3
4
5
Trapezoidal
2
3
4
5
MAE
0.129
0.127
0.122
RMSE
16035
16965
15062
MAE
0.142
0.126
0.475
RMSE
16643
16700
74466
MAE
0.131
0.437
11.978
RMSE
16023
106423
19406510
MAE
0.141
2.346
89796.671
RMSE
27228
1980899
72302242000
78
4.7
Advances in Artificial Intelligence Applications
PERFORMANCE COMPARISON
The values predicted by the adaptive neural network with
fuzzy inference system (ANFIS) are compared with the
answer. The forecasting accuracy was evaluated by
undertaking the comparison with the ANN model. The best
results of ANFIS during the training are compared with the
ANN model in order to get the best forecasting model of
rice yields. The performances of the best models developed
by ANN and ANFIS models for forecasting data sets are
summarized in Table 4.4. As it can be seen from Table 4.4,
ANFIS model produces more minimum error than ANN
model in term of RMSE and MAE measurement.
Table 4.4
Rice yields forecast results
Performance
ANN
ANFIS
criterion
MAE (%)
0.1103
0.101
RMSE
16782.406 12945
Figure 4.5 shows overall summary statistics forecasting for
rice yields with three models by using box-plot. Figure 4.5
demonstrate that the ANFIS model performance is in
general, accurate and satisfactory, where all data points of
errors are quite near the zero.
4.8
CONCLUSION
The ANN and ANFIS models were developed to predict
the rice yield production using real data acquired from
Muda Agricultural Development Authority (MUDA),
Kedah. The prediction performances of both models are
ANFIS for Rice Yields Forecasting
79
measured in terms of mean absolute error (MAE) and rootmean-square error (RMSE).
Figure 4.3
Comparison of the ANN and ANFIS models
Based on the performance of these models, it can be
concluded that ANFIS is an effective method to forecast
rice yields. The result suggests that the ANFIS method is
superior to the ANN method in the modeling of time-series.
This is because ANFIS combines the learning capabilities
of a neural network and reasoning capabilities of fuzzy
logic, thus having an extended prediction capability
compared to single ANN and fuzzy logic techniques. The
results of the ANFIS model show that the ANFIS can be
applied successfully to develop time-series forecasting
models, which could provide accurate forecasting and
modeling of time-series data.
80
4.9
Advances in Artificial Intelligence Applications
REFERENCE
Chang, F. J, and Chang Y. T. (2006). Adaptive neuro-fuzzy
inference system for prediction of water level in
reservoir. Advances in Water Resources 29:1–10.
Chang, F. J., Hu, H. F. and Chen, Y. C. (2001). Counter
propagation fuzzy–neural network for river flow
reconstruction. Hydrological Processes 15:219–232.
The strategy of building a flood forecast model by neurofuzzy network. Hydrological Processes 20: 1525–
1540.
Firat, M. and Gungor, M. (2007). River Flow Estimation
using Adaptive Neuro-Fuzzy inference System.
Mathematics and Computers in Simulation 75(3–4):
87–96.
Firat, M. (2007). Watershed modeling by adaptive neurofuzzy inference system approach. PhD thesis,
Pamukkale University, Turkey (in Turkish).
Ho, S. L., Xie, M. and Goh, T. N. (2002). A comparative
study of neural network and Box–Jenkins ARIMA
modeling in time series prediction. Computers and
Industrial Engineering 42: 371–375.
Jang, J. S. R., Sun, C. T. and Mizutani, E. (1997). NeuroFuzzy and Soft Computing: Computational
Approach to Learning and Machine Intelligence.
Prentice Hall: Upper Saddle Chen SH, Lin YH,
Chang LC, Chang FJ. 2006. River, NJ.
Liong, S. Y., Lim W. H., Kojiri, T. and Hori, T. (2000).
Advance flood forecasting for flood stricken
Bangladesh with a fuzzy reasoning method.
Hydrological Processes 14: 431–448.
Mahabir, C., Hicks, F. E. and Fayek, A. R. (2000).
Application of fuzzy logic to the seasonal runoff,
Hydrological Processes 17: 3749–3762.
ANFIS for Rice Yields Forecasting
81
Murat, Y. S. (2006). Comparison of fuzzy logic and
artificial neural networks approaches in vehicle
delay modeling. Transportation Research, Part C:
Emerging Technologies 14: 316–334.
Nayak, P. C., Sudheer, K. P., Ragan, D. M. and Ramasastri,
K. S., b. (2004). A neuro fuzzy computing
technique for modeling hydrological time series.
Journal of Hydrology 29: 52–66.
Ozelkan, E. C. and Duckstein, L. (2001). Fuzzy conceptual
rainfall-runoff models. Journal of Hydrology 253:
41–68.
Saad, P., Bakri, A., Kamarudin, S. S. and Jaafar, M. N.
(2006). Intelligent Decision Support System for Rice
Yield Prediction in Precision Farming. IRPA
Report.
Sen, Z. and Altunkaynak, A. (2006). A comparative fuzzy
logic approach to runoff coefficient and runoff
estimation. Hydrological Processes 20:1993–2009.
Zhang, G. P. (2003). Time Series Forecasting Using a
Hybrid ARIMA and Neural Network Model.
Neurocomputing 50: 159-175.
Zou, H. F., Xia, G. P., Yang, F. T. and Wang, H. Y. (2007).
An Investigation and Comparison of Artificial
Neural Network and Time Series Models for
Chinese
Food
Grain
Price
Forecasting.
Neurocomputing, 70: 2913-2923.
5
HYBRIDIZATION OF SOM AND
GENETIC ALGORITHM TO
DETECT OF UNCERTAINTY IN
CLUSTER ANALYSIS
E. Mohebi
Mohd. Noor Md. Sap
5.1
INTRODUCTION
The self organizing map (SOM) proposed by Kohonen
(1982), has been widely used in industrial applications such
as pattern recognition, biological modeling, data
compression, signal processing and data mining (Kohonen,
1997; Mohebi and Sap, 2008a, 2008b; Sap and Mohebi,
2008a). It is an unsupervised and nonparametric neural
network approach. The success of the SOM algorithm lies
in its simplicity that makes it easy to understand, simulate
and be used in many applications. The basic SOM consists
of neurons usually arranged in a two-dimensional structure
such that there are neighborhood relations among the
neurons. After completion of training, each neuron is
attached to a feature vector of the same dimension as input
space. By assigning each input vector to the neuron with
nearest feature vectors, the SOM is able to divide the input
space into regions (clusters) with common nearest feature
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
83
vectors. This process can be considered as performing
vector quantization (VQ) (Gray, 1984). Also, because of
the neighborhood relation contributed by the interconnections among neurons, the SOM exhibits another
important property of topology preservation.
Clustering algorithms attempt to organize unlabeled
input vectors into clusters such that points within the
cluster are more similar to each other than vectors
belonging to different clusters (Pal et al., 1993). The
clustering methods are of five types: hierarchical clustering,
partitioning clustering, density-based clustering, grid-based
clustering and model-based clustering (Han and Kamber,
2000). The rough set theory employs two upper and lower
thresholds in the clustering process which result in a rough
clusters appearance. This technique also could be defined
in incremental order i.e. the number of clusters is not
predefined by users.
In this paper, a new two-level clustering algorithm
is proposed. The idea is that the first level is to train the
data by the SOM neural network and the clustering at the
second level is a rough set based incremental clustering
approach (Ashraf et al., 2006), which will be applied on the
output of SOM and requires only a single neurons scan.
The optimal number of clusters can be found by rough set
theory which groups the given neurons into a set of
overlapping clusters (clusters the mapped data respectively).
Then the overlapped neurons will be assigned to the true
clusters they belong to, by apply genetic algorithm. A
genetic algorithm has been adopted to minimize the
uncertainty that comes from some clustering operations. In
our previous work (Sap and Mohebi, 2008a) the hybrid
SOM and rough set has been applied to catch the involved
ambiguity of clusters but the experiment results show that
84
Advances in Artificial Intelligence Applications
the proposed algorithm (Genetic Rough SOM) outperforms
the previous one.
This paper is organized as following; in Section 5.2
the basics of Self Organizing Map algorithm are outlined.
The basic of incremental clustering and rough set theory
approach are described in Section 5.3. In Section 5.4 the
essence of genetic algorithm is described. The proposed
algorithm is presented in Section 5.5. Section 5.6 is
dedicated to experiment results and Section 5.7 provides
brief conclusion and future works. In Section 5.8 a brief
summary is described.
5.2
SELF ORGANIZING MAP
Competitive learning is an adaptive process in which the
neurons in a neural network gradually become sensitive to
different input categories, sets of samples in a specific
domain of the input space. A division of neural nodes
emerges in the network to represent different patterns of the
inputs after training.
The division is enforced by competition among the
neurons: when an input x arrives, the neuron that is best
able to represent it wins the competition and is allowed to
learn it even better. If there exist an ordering between the
neurons, i.e. the neurons are located on a discrete lattice,
the competitive learning algorithm can be generalized. Not
only the winning neuron but also its neighboring neurons
on the lattice are allowed to learn, the whole effect is that
the final map becomes an ordered map in the input space.
This is the essence of the SOM algorithm. The SOM
consist of m neurons located on a regular low-dimensional
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
85
grid, usually one or two dimensional. The lattice of the grid
is either hexagonal or rectangular.
The basic SOM algorithm is iterative. Each neuron
i has a d -dimensional feature vector wi = [ wi1 ,..., wid ] . At
each training step t , a sample data vector x(t ) is randomly
chosen for the training set. Distance between x(t ) and all
feature vectors are computed. The winning neuron, denoted
by c , is the neuron with the feature vector closest to x(t ) :
i ∈ {1,..., m}
c = arg min x (t ) − wi ,
i
(1)
A set of neighboring nodes of the winning node is denoted
as N c . We define hic (t ) as the neighborhood kernel function
around the winning neuron c at time t . The neighborhood
kernel function is a non-increasing function of time and of
the distance of neuron i from the winning neuron c . The
kernel can be taken as a Gaussian function:
hic (t ) = e
−
pi − pc
2σ ( t )
2
2
(2)
where pi is the coordinates of neuron i on the output grid
and σ (t ) is kernel width. The weight update rule in the
sequential SOM algorithm can be written as:
⎧w (t ) + ε (t )hic (t )(x(t ) − wi (t ) )∀i ∈ N c
wi (t + 1) = ⎨ i
wi (t )
ow
⎩
(3)
86
Advances in Artificial Intelligence Applications
Both learning rate ε (t ) and neighborhood σ (t ) decrease
monotonically with time. During training, the SOM
behaves like a flexible net that folds onto a cloud formed
by training data. Because of the neighborhood relations,
neighboring neurons are pulled to the same direction, and
thus feature vectors of neighboring neurons resemble each
other. There are many variants of the SOM (Yan and
Yaoguang, 2005; Sap and Mohebi, 2008b). However, these
variants are not considered in this paper because the
proposed algorithm is based on SOM, but not a new variant
of SOM.
The 2D map can be easily visualized and thus give
people useful information about the input data. The usual
way to display the cluster structure of the data is to use a
distance matrix, such as U-matrix (Ultsch and Siemon,
1990). U-matrix method displays the SOM grid according
to neighboring neurons. Clusters can be identified in low
inter-neuron distances and borders are identified in high
inter-neuron distances. Another method of visualizing
cluster structure is to assign the input data to their nearest
neurons. Some neurons then have no input data assigned to
them. These neurons can be used as the border of clusters
(Zhang and Li, 1993).
5.3
INCREMENTAL CLUSTERING AND ROUGH
SET THEORY
5.3.1
Incremental clustering
Incremental clustering (Jain et al., 1999) is based on the
assumption that it is possible to consider data points one at
a time and assign them to existing clusters. Thus, a new
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
87
data item is assigned to a cluster without looking at
previously seen patterns. Hence the algorithm scales well
with size of data set.
It employs a user-specified threshold and one of the
patterns as the starting leader (cluster’s leader). At any step,
the algorithm assigns the current pattern to the most similar
cluster (if the distance between pattern and the cluster’s
leader is less or equal than threshold) or the pattern itself
may get added as a new leader if its similarity with the
current set of leaders does not qualify it to get added to any
of the existing clusters. The set of leaders found acts as the
prototype set representing the clusters and is used for
further decision making. A high-level description of a
typical incremental algorithm is as Figure 5.1 (Stahl, 1986).
An incremental clustering algorithm for dynamic
information processing was presented by Can (1993). The
motivation behind this work is that, in dynamic databases,
items might get added and deleted over time. These
changes should be reflected in the partition generated
without significantly affecting the current clusters. This
algorithm was used to cluster incrementally a database of
12,684 documents.
The quality of a conventional clustering scheme is
determined using within group error (Sharma and Werner,
1993) Δ given by:
Δ =
∑ ∑ distance
m
i =1 u h , u k ∈C i
(u h , u k )
u h , u k are objects in the same cluster C i .
(4)
88
Advances in Artificial Intelligence Applications
Incremental_Clustering (Data, Thr){
Cluster_Leader = d1;
While (there is unlabeled data){
For (i = 2
to
N)
If (dis (Cluster_Leader, di) <= Thr)
Put di in the same cluster as
Cluster_Leader;
Else
// new Cluster
Cluster_Leader = di;
}//end of while
}
Figure 5.1
5.3.2
Incremental clustering algorithm
Rough set Incremental clustering
This algorithm is a soft clustering method employing rough
set theory (Pawlak, 1982). It groups the given data set into
a set of overlapping clusters. Each cluster is represented by
a lower approximation and an upper approximation
( A(C ), A (C )) for every cluster C ⊆ U . Here U is a set of all
objects under exploration. However, the lower and upper
approximations of Ci ∈ U are required to follow some of
the basic rough set properties such as:
1. 0/ ⊆ A(Ci ) ⊆ A (Ci ) ⊆ U
2. A(Ci ) ∩ A(C j ) = 0/ , i ≠ j
89
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
3. A(Ci ) ∩ A (C j ) = 0/ , i ≠ j
4. If an object u k ∈ U is not part of any lower
approximation, then it must belong to two or more
upper approximations.
Note that (1)-(4) are not independent. However
enumerating them will be helpful in understanding the
basic of rough set theory.
The lower approximation A(C ) contains all the
patterns that definitely belong to the cluster C and the
upper approximation A (C ) permits overlap. Since the upper
approximation permits overlaps, each set of data points that
are shared by a group of clusters define indiscernible set.
Thus, the ambiguity in assigning a pattern to a cluster is
captured using the upper approximation. Employing rough
set theory, the proposed clustering scheme generates soft
clusters (clusters with permitted overlap in upper
approximation) see Figure 5.2. A high-level description of
a rough incremental algorithm is as Figure 5.3 (Lingras et
al., 2004).
b
b
bb
b
a
a
b b
b
b b
b
b
b
a
ba
a
a
b
ba
b
b b
b b
b b
b
b
b
b
a
a
a
a a
a
a
a
a a
a aa a a a a
a
a
a a
a a
a a
aa
a
a
a
a
Upper Threshold
Lower Threshold
Figure 5.2
Rough set Incremental clustering
90
Advances in Artificial Intelligence Applications
For a rough set clustering scheme and given two objects
u h , u k ∈ U we have three distinct possibilities:
1. Both u k and u h are in the same lower approximation
A(C ) .
2. Object u k is in lower approximation A(C ) and u h is
in the corresponding upper approximation A (C ) , and
case 1 is not applicable.
3. Both u k and u h are in the same upper approximation
A (C ) , and case 1 and 2 are not applicable.
Rough_Incremental (Data, upper_Thr, lower_Thr){
Cluster_Leader = d1;
While (there is unlabeled data){
For (i = 2 to
N)
If (distance(Cluster_Leader, di) <= lower_Thr)
Put di in the lower approx of Cluster_Leader;
ElseIf (distance(Cluster_Leader,di) <= upper_Thr)
Put di in all existing clusters (j=1 to k)that
distance(Cluster_Leaderj, di) <= upper_Thr ;
Else
// new Cluster
Cluster_Leader = di;
}//end of while
}
Figure 5.3
Rough set Incremental clustering algorithm
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
91
For these possibilities, three types of equation (4) could be
defined as following:
Δ1 =
Δ2 =
Δ3 =
∑
m
∑
∑
distance ( u h , u k )
∑
i =1 u h ,u k ∈ A ( X i )
m
∑
∑
i = 1 u h ∈ A ( X i ) and
m
distance ( u h , u k )
uk∈ A ( X i )
(5)
distance ( u h , u k )
i =1 u h ,u k ∈ A ( X i )
The total error of rough set clustering will then be a
weighted sum of these errors:
Δtotal= w1 ×Δ1 +w2 ×Δ2 +w3 ×Δ3
wherew1 > w2 > w3.
Since Δ1 corresponds to situations where both objects
definitely belong to the same cluster, the weight w1 should
have the highest value.
5.4
GENETIC ALGORITHM
Genetic algorithm was proposed by John Holland in early
1970s, it applies some of natural evolution mechanism such
as crossover, mutation, and survival of the fitness to
optimization and machine learning. GA provides very
efficient search method working on population, and has
been applied to many problems of optimization and
classification (Goldberg, 1989).
92
Advances in Artificial Intelligence Applications
General GA process is as follows:
1. Initial the population of genes.
2. Calculates the fitness of each individual in the
population.
3. Reproduce the individual selected to form a new
population according to each individual’s fitness.
4. Perform crossover and mutation on the population.
5. Repeat step (2) through (4) until some condition is
satisfied.
Crossover operation swaps some part of genetic bit string
within parents. It emulates just as crossover of genes in real
world that descendants are inherited characteristics from
both parents. Mutation operation inverts some bits from
whole bit string at very low rate. In real world we can see
that some mutants come out rarely.
Figure 5.4, shows the way of applying crossover
and mutation operations to genetic algorithm. Each
individual in the population evolves to getting higher
fitness generation by generation.
Crossover
Mutation
0100000010 0100000111
0111000110
×
1110010111 1110010010
Figure 5.4
algorithm
Crossover
and
mutation
0111010110
of
Genetic
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
5.5
93
ROUGH CLUSTERING OF THE SOM USING
GENETIC ALGORITHM
In this paper rectangular grid is used for the SOM. Before
training process begins, the input data will be normalized.
This will prevent one attribute from overpowering in
clustering criterion. The normalization of the new pattern
X i = {xi1 ,..., xid } for i = 1,2,..., N is as following:
Xi =
Xi
.
Xi
(7)
Once the training phase of the SOM neural network
completed, the output grid of neurons which is now stable
to network iteration, will be clustered by applying the
rough set algorithm as described in the previous section.
The similarity measure used for rough set clustering of
neurons is Euclidean distance (the same used for training
the SOM). In this proposed method (see Figure 5.5) some
neurons, those never mapped any data are excluded from
being processed by rough set algorithm.
From the rough set algorithm it can be observed that
if two neurons are defined as indiscernible (those neurons
in the upper approximation of two or more clusters), there
is a certain level of similarity they have with respect to the
clusters they belong to and that similarity relation has to be
symmetric. Thus, the similarity measure must be symmetric.
According to the rough set clustering of the SOM,
overlapped neurons and respectively overlapped data (those
data in the upper approximation) are detected. In the
experiments, to calculate errors and uncertainty, the
previous equations will be applied to the results of SOM
94
Advances in Artificial Intelligence Applications
(clustered and overlapped data). Then for each overlapped
neuron a gene is generated that represents the alternative
distances from each cluster leader. Figure 5.6, shows an
example of the generated genes for m overlapped neurons
on n existing cluster leaders.
Lower approx
Upper approx
Rough SOM
Figure 5.5
Rough clustering of
overlapped neurons are highlighted.
the
SOM,
the
After the genes have been generated the genetic
algorithm is employed to minimize the following fitness
function which represents the total sum of each d j of the
related gene:
F=
∑∑ g (d
m
n
i =1 j =1
i
j)
(8)
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
gene 1
gene 2
gene 3
d1
d1
d1
d2
d2
d2
d3
d3
d3
d4
d4
d4
….
….
….
dn-1 dn
dn-1 dn
dn-1 dn
.
.
.
.
.
….
.
gene m
d1
d2
d3
d4
….
dn-1 dn
95
.
Figure 5.6
Generated genes. m number of overlapped
neurons and n is number of existing clusters. The
highlighted di is the optimize one that minimize the fitness
function
The aim of the proposed approach is making the genetic
rough set clustering of the SOM to be as precise as possible.
Therefore, a precision measure needs to be used for
evaluating the quality of the proposed approach. A possible
precision measure can be defined as the following equation
(Pawlak, 1982):
certainty =
5.6
Number of objects in lower approx
Total number of objects
(9)
EXPERIMENTAL RESULTS
To demonstrate the effectiveness of the proposed clustering
algorithm GR-SOM (Genetic Rough set Incremental
clustering of the SOM), two phases of experiments has
been done on the well known Iris data set from the UC
Irvine Machine Learning Repository Database. The Iris
data set which has been widely used in pattern
classification. It has 150 data points of four dimensions.
96
Advances in Artificial Intelligence Applications
The data are divided into three classes with 50 points each.
The first class of Iris plant is linearly separable from the
other two. The other two classes are overlapped to some
extent.
The first phase of experiments, presents the
uncertainty that comes from the data set and in the second
phase the errors has been generated. The results of GRSOM and Rough set Incremental SOM (RI-SOM) (Mohebi
and Sap, 2008b) are compared to Incremental clustering of
SOM (I-SOM) (Sap and Mohebi, 2008a). The input data
are normalized such that the value of each datum in each
dimension lies in [0,1] . For training, SOM 10 × 10 with 100
epochs on the input data is used. The general parameters
for the genetic algorithm have been configured as Table 5.1.
Table 5.1
General parameters of the genetic algorithm
of the experiment results.
Population Size
Number of Evaluation
Crossover Rate
Mutation Rate
Number of Generation
50
10
0.25
0.001
100
Figure 5.7, shows the certainty generated from epoch 100
to 500 by (9) on the mentioned data set. From the gained
certainty it’s obvious that the GR-SOM could efficiently
detect the overlapped data that have been mapped by
overlapped neurons (Table 5.2).
97
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
Table 5.2
Certainty-level of I-SOM, RI-SOM and GRSOM on the Iris data set.
100
200
300
400
500
I-SOM
33.33
65.23
76.01
89.47
92.01
RI-SOM
67.07
73.02
81.98
91.23
97.33
GR-SOM
69.45
74.34
83.67
94.49
98.01
Certainty
Epoch
I‐SOM
RI‐SOM
200
300
Epoch
GR‐SOM
100
90
80
70
60
50
40
30
20
10
0
100
400
500
Figure 5.7
Generated certainty-level of I-SOM, RISOM and GR-SOM on the Iris data set from epoch 100 to
500.
In the second phase, the same initialization for the
SOM has been used. The errors that come from the data
sets, according to the (5) and (6) have been generated by
our proposed algorithms (Table 5.3). The weighted sum (6)
has been configured as (10).
98
Advances in Artificial Intelligence Applications
∑w
3
i =1
i
=1
(10)
and for each wi we have :
1
wi = × (4 − i ).
6
Table 5.3
set.
Iris
Data
set
5.7
Comparative generated error on the Iris data
Method
GRSOM
Δ1
Δ2
Δ3
Δ total
1.05
0.85
0.04
1.4
I-SOM
2.8
CONCLUSION AND FUTURE WORK
In this paper a two-level based clustering approach (GRSOM), has been proposed to predict clusters of high
dimensional data and to detect the uncertainty that comes
from the overlapping data. The approach is based on the
rough set theory that employs a soft clustering which can
detects overlapped data from the data set and makes
clustering as precise as possible, then GA is applied to find
the true cluster for each overlapped data. The results of the
both phases indicate that GR-SOM is more accurate and
generates fewer errors as compare to crisp clustering (ISOM).
The proposed algorithm detects accurate
overlapping clusters in clustering operations. As the future
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
99
work, the overlapped data also could be assigned correctly
to true clusters they belong to, by assigning fuzzy
membership value to the indiscernible set of data. Also a
weight can be assigned to the data’s dimension to improve
the overall accuracy.
5.8
SUMMARY
Recently researchers found that to capture the uncertainty
involved in cluster analysis, it's not necessary to apply only
one threshold to determine the clusters boundaries. In this
paper to reduce the uncertainty, a new combination of the
Kohonen Self Organizing Map, one of the popular tool for
clustering data, and rough set theory and genetic algorithm
is proposed. The proposed two-level algorithm, first
using SOM to produce the prototypes then applying rough
set and genetic algorithm to assign the overlapped data to
true clusters they belong to, is found more accurate
compared with the proposed crisp clustering algorithm (ISOM) and reduces the errors.
5.9
REFERENCES
Asharaf, S., Narasimha, M. M., and Shevade, S. K. (2006).
Rough set based incremental clustering of interval
data. Pattern Recognition Letters, Vol. 27, pp. 515519.
Can, F. (1993). Incremental Clustering for dynamic
information peocessing. ACM Trans. Inf. System
(11) 2, pp. 143-164.
100
Advances in Artificial Intelligence Applications
Goldberg, D. E. (1989). Genetic Algorithm in Search
Optimization and Machine Learning. AddisonWesley Pubishing Co.inc.
Gray, R. M. (1984). Vector quantization. In IEEE Acoust.
Speech, Signal Process. Mag. 1 (2) . pp. 4–29.
Han, J. and Kamber, M. (2000). Data mining: concepts
and techniques. Morgan-Kaufman, San Francisco.
Irvine, U. C. (1987). Machine Learning Repository
Database. Available at: http://archive.ics.uci.edu
[Accessed 12 April 2008]
Jain, A. K., Murty, M. N. and Flynn, P. J. (1999). Data
Clustering: A Review. ACM Computing Surveys (31)
(3) pp. 264–323.
Kohonen, T. (1982). Self-Organized formation of
topologically correct feature maps. In Biol. Cybern.
43. pp. 59–69.
Kohonen, T. (1997). Self-Organizing Maps. Springer,
Berlin, Germany.
Lingras, P. J. and West, C. (2004). Interval set clustering of
web users with rough K-means. J. Intelligent Inf.
Syst. (23) (1) pp.5–16.
Mohebi, E. and Sap, M. N. M. (2008). a. Hybrid Self
Organizing Map for Overlapping Custers. In
Springer-Verlage Proceedings of the CCIS. Hainan
Island, China, 2009. Accepted
Mohebi, E. and Sap, M. N. M. (2008). b. Rough set Based
Clustering of the Self Organizing Map. In IEEE
Computer Scociety Proceeding of the 1st Asean
Conference on Intelligent Information and
Database Systems. Dong Hoi, Vietnam, 2008.
Pal, N. R., Bezdek, J. C. and Tsao E. C. K. (1993).
Generalized clustering networks and Kohonen’s
self-organizing scheme. IEEE Trans. Neural
Networks (4), pp. 549–557.
Hybridization of SOM and Genetic Algorithm to Detect of
Uncertainty in Cluster Analysis
101
Pawlak, Z. (1982). Rough sets. Internat. J. Computer Inf.
Sci. (11) pp.341–356.
Sap, M. N. M and Mohebi E., a. (2008). A Novel
Clustering of the SOM using Rough set. In IEEE
Proceeding of the 6th Student Conference on
Research and Development. Johor, Malaysia 2008.
Accepted
Sap, M. N. M. and Mohebi E., b. (2008). Outlier Detection
Methodologies: A Review. Journal of Information
Technology, UTM, Vol. 20, Issue 1, 2008. pp. 87105.
Sharma, S. C. and Werner A. (1981). Improved method of
grouping provincewide permanent traffic counters.
Transaction Research Report 815, Washington D.C.
pp.13-18.
Stahl, H. (1986). Cluster analysis of large data sets. In
Classification as a Tool of Research. W. Gaul and
M. Schader, Eds. Elsevier North-Holland, Inc., New
York, NY pp. 423–430.
Ultsch, A. and Siemon. H. P. (1990). Kohonen’s self
organizing feature maps for exploratory data
analysis. Proceedings of the International Neural
Network Conference, Dordrecht, Netherlands pp.
305–308.
Zhang, X. and Li, Y. (1993). Self-organizing map as a new
method for clustering and data analysis”.
Proceedings of the International Joint Conference
on Neural Networks, Nagoya, Japan. pp. 2448–2451.
Yan and Yaoguang, (2005). Research and application of
SOM neural network which based on kernel
function. Proceeding of ICNN and B’05 (1). pp.
509- 511.
6
A MINING-BASED APPROACH FOR
SELECTING BEST RESOURCES
NODES ON GRID RESOURCE
BROKER
Asgarali Bouyer
Mohd Noor Md Sap
6.1
INTRODUCTION
Nowadays, Grid Computing has been accepted as an
infrastructure to perform parallel computing in distributed
computational resources (Karl et al., 2001). Grid has users,
resources, and an information service (IS). Grid computing
is new technology for creating distributed infrastructures
and virtual organizations (VOs) for applying a very largescale computing or enterprise applications. In a grid
environment, the computational resource is main part of
system that can be a Desktop PC, Cluster machine or supercomputer. A main goal of grid computing is enabling
applications to identify resources dynamically to create
distributed computing environments that can utilize
computing resources on demand (Karl et al., 2001).
A resource broker is fundamental in any large-scale
Grid environment. Since grid resources are geographically
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
103
distributed and heterogeneous, the task of a Grid resource
broker and scheduler is to dynamically identify and
characterize the available resources, and to select and
allocate the most appropriate resources for a given job. In
particular, the heterogeneity of resources and lack of
ownership are two major issues that must be dealt with
when designing a GRB. In a broker-based management
system, brokers are responsible for selecting best nods,
ensuring the trustworthiness of the service provider.
Resource selection is an important issue in a grid
environment where a consumer and a service provider are
distributed geographically across multiple administrative
domains. Choosing the suitable resource for a user job to
meet predefined constraints such as deadline, speedup and
cost of execution is an important problem in grids. In our
approach, we highly have solved some of these problems
(Klaus et al., 2002).
In this paper we will not do a resource discovery
method, but in fact we present a novel way for selecting the
best nodes in pool of discovered nodes. Resource selection
involves a set of factors including application execution
time, available main memory, disk (secondary memory),
resource access policies, etc. resource selection must
consider information about resource reliability, prediction
error probability, and real time execution. However, these
various performance measures can be considered under the
condition that the middleware allows adaptation of its
internal scheduling with desired application’s services. We
have considered all of these factors in our approach. Also
to reach for better selection we used the Decision Tree with
Fuzzy Logic theory (Motohide et al., 1994). Induced
decision trees are an extensively-researched solution to
classification tasks. The use of Fuzzy Logic techniques
104
Advances in Artificial Intelligence Applications
may be relevant in case representation to allow for
imprecise and uncertain values in features.
The rest of this paper is organized as follows.
Section 6.2 overviews previous research on resource
brokering and scheduling. Section 6.3 decribe Fuzzy
Decision Tree Algorithm in our method. Section 6.4
discuss the system design and implementation details of our
OGSI-compliant Grid resource broker service, respectively.
Section 6.5 describes experimental results and Section 6.6
concludes the paper.
6.2
RELATED WORKS
Many projects, such as DI-GRUBER (Dumitrescu CL and
Foster Ian,2005), eNANOS (Ivan et al., 2005), AppLes
(Henri et al., 2000) and OGSI- based broker (Seok et al.,
2004) have been performed on grid. In this section we
introduce some of these brokers.
DI-GRUBER (Dumitrescu and Ian, 2005), an
extension to the GRUBER brokering framework, was
developed as a distributed grid USLA based resource
broker that allows multiple decision points to coexist and
cooperate in real-time. GRUBER has been implemented in
both Globus Toolkit4 (GT4) and Globus Toolkit3 (GT3).
The part of DI-GRUBER that dosing resource finding and
selecting is called The GRUBER engine. GRUBER engine
is the main component of the GRUBER architecture and
that implements various algorithms to detect available
resources and maintains a generic view of resource
utilization in the grid (Dumitrescu and Ian, 2005).
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
105
GRUBER does not itself perform job submission,
but it can be used in conjunction with one of various grid
job submission infrastructures. The eNANOS Resource
Broker is an OGSI-Compliant resource broker developed as
a Grid Service and is supported by Globus Toolkit (GT2
and GT3) middleware (Ivan et al., 2005). eNANOS
architecture neither uses data mining methods to select the
best nodes from the pool of discovered nodes, nor
implements in Web Services (WS) bases frameworks.
AppLes (Application Level Scheduling) focuses on
developing scheduling agents for individual Grid
applications (Henri et al., 2000). AppLes agents have an
application oriented scheduling mechanism, and use static
or dynamic application and resource information to select a
set of resources. However, they perform resource
discovering and scheduling without considering resource
owner policies. Also they do not support system-oriented or
extensible scheduling policies.
Another resource broker service has been presented
by Seok et al. (2004). It is an OGSI- based broker that is
supported by GT3. It is a new general purpose OGSIcompliant Grid resource broker service that performs
resource discovering and scheduling with close interactions
with GT3 Core and Base Services. This resource broker
service considers resource owner policies as well as user
requirements on the resources.
The EZ-Grid project (Barbara et al., 2001) applies
Globus services to create Grid resource usage easier and
more transparent for the user. This is obtained by
developing easy-to-use interfaces coupled with brokerage
systems to assist the resource selection and job execution
process.
106
Advances in Artificial Intelligence Applications
Another works have been done in resource selection
field (e.g. Condor/G (James et al., 2001), Nimrod/G, LSF
and so forth), but we cannot introduce all of them in this
paper.
Finally, we mention that none of those systems or
brokers uses machine learning methods to select the best
nodes for purposed jobs.
6.1
FUZZY DECISION TREE
Induced decision trees are an extensively-researched
solution to classification tasks. General decision tree
always has a deterministic result, and therefore this feature
is not good in some application. Thus, if we can use DC
with fuzzy logic, we can achieve a better decision. Fuzzy
decision Tree (FDT) is the generalization of decision tree in
fuzzy environment. The knowledge represented by fuzzy
decision tree is closer to the human classification
(Christophe and Bernadette, 2003). In our approach we
used a Fuzzy decision tree (FDT).
6.1.1
Fuzzy Logic (FL)
Essentially, Fuzzy Logic (FL) is a multi-valued logic that
allows middle values to be defined between conventional
evaluations like yes/no, true/false, black/white, etc. Fuzzy
logic is an extension of Boolean logic that replaces binary
truth values with degrees of truth. It was introduced in
1965 by Prof. L.Zadeh at the University of California,
Berkeley (George and Bo, 1995). The basic notion of fuzzy
systems is a fuzzy set. for example, to classify the fuzzy set
of climate, which may be consisted of members like “Very
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
107
cold”, “Cold”, “Warm”, “Hot”, and “Very hot”. The theory
of fuzzy sets enables us to structure and describe activities
and observations, which differ from each other only
vaguely, to formulate them in models and to use these
models for various purposes - such as problem-solving and
decision-making (George and Bo, 1995). We will not
discuss fuzzy set such natural extensions here and more
about fuzzy logic can be found in (Zadeh, 1984).
6.1.2
Fuzzy Decision Tree Algorithm
This algorithm is a developed version of ID3 that operate
on fuzzy set and it will produce a fuzzy decision tree (FDT).
Before this, other researchers (Motohide et al., 1994;
Christophe and Bernadette, 2003) considered the FDT in
their applications. Thus, their results showed that this
algorithm is suitable for our approach. But there are two
important points in making and applying FDT (Janikow,
1998):
Select the best attribute in each node to develop the
tree: there are many criteria for this aim, but we will use
one of them.
Inference procedure from FDT. In the classification
step for a new sample in FDT, we may encounter many leaf
nodes with deferent confidence that offer some classes for
purposed sample. Thus, the fitness mechanism selection is
important here.
Before we express the algorithm, we will consider
some assumptions and notation:
- The training examples will be called E set with N
example. Each example has N properties and every
108
Advances in Artificial Intelligence Applications
property Aj contain mj linguistic term and so the number of
output class will be as following.
Fuzzy terms for
- The set of exist examples in t nodes show by X .
: represent the degree membership of example
x belongs to the class ck.
: represent the degree membership of crisp
value for attribute j in example x belongs to the fuzzy
term
in j attribute. Also consider four following
formulas:
(1)
(2)
(3)
(4)
6.1.2.1 Creating a Fuzzy Decision Tree
Step1: Start with all the training examples, having the
original weights (degree membership of each sample to
desired class is considered 1 value), in the root node. In
other words, all training examples are used with their initial
weights (This initial weight is not necessarily 1).
Step2: if in one of the node t with fuzzy set X one of the
below condition is true, that node will consider as a leaf
node.
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
109
Con1: for all examples of set X, the proportion for degree
membership in a class to sum of degree membership of all
data to different classes is equal or greater than θr .
Con2: sum of degree membership of all data in set X, less
than Threshold θr .
Con3: there have not been existed another attribute for
selection.
Step3: if any conditions of step 2 for desired node is not
true, then this node should be developed. Thus:
Step3.1: find all attributes in a path from root node to
desired node, and then remove it from attribute set. So
remaining attribute will be more luck for selection.
Step3.2: for every remaining attribute (Ai), we should
select an attribute according to Entropy measure
(Christophe et al., 2003) to develop the tree (
).
so that,
Step3.3: split X set into subsets
all elements in , there is a coefficient of fuzzy term for
.
Step3.4: for every of these subsets, we will define nodes
and then the edges are labeled by
110
Advances in Artificial Intelligence Applications
values (i=1,2,…, mAmax). Then, the degree membership
for each example to new node will be computed as
following.
Step3.5: exchange each Xi with X and then repeat step 2
and 3.
6.2
SYSTEM ARCHITECTURE
We have shown a general architecture for this approach
(Figure 6.1). Our supplied application is performed on top
of GT3. But it can be applied for GT4. For the nonce, we
have provided an isolated application that can be worked
based on GT3, for this purpose. The Result of every node is
sent in an XML document and is stored in a Temporary
XML Database (TXD).
6.2.1
Miner Application
To do this, we want to install a Miner Application (MA) for
every node in a purposed grid. MA contains an internal
small database (in log file role). One of the primary tasks of
MA is writing log file.
When desired node is connected to grid, MA must
update its log file (insert a new record to database) or when
a new job is submitted to this node, MA will update the
related record, because we want to know the number of
jobs that are executed on this node. At the moment, if the
job is finished successfully or if the job is failed for any
reason, thus, MA will update the log file (there is a Boolean
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
111
field in table that if it is set to TRUE, this means that the
related job has been finished successfully, otherwise, it
means that the job is not successfully done and has failed).
Also, we have considered some new tasks for Grid
Resource Broker (GRB), which we have called Optimal
GRB.
Node1
Log file
Node2
Log file
Miner‐App
Miner‐App
Broker
Layer
Node3
Log file
….
Miner‐App
Request broker (discovery)
TXD
Submission
Global job
queue
Resource
monitoring
Resource selector
Fuzzy DT executer
Job
submission
GT middleware
Layer
Figure 6.1
Selecting nodes for job
MDS
Job scheduler
General architecture for our booking
Before selecting any nodes (for aimed job) by GRB, one of
these tasks will be executed, this is responsible for sending
a packet to each node on grid besides previous tasks.
112
Advances in Artificial Intelligence Applications
Needless to say, this task can be executed during recourses
discovery operation by GRB. Further, as already stated,
there are many different methods to find resources (nodes),
but will not concentrate on how we can discovery nodes;
and we will not mention them in this paper.
Suppose that, there are many different nodes in our
grid that are ready for executing jobs and we want to select
some nodes in the pool of these nodes. At the beginning,
GRB has sent a packet to each connected nodes to our grid.
This packet contains some information about a new job (e.g.
IP Sender, Size of the job, Size of needed RAM and HDD,
average time needed for execution, approximate execution
start time, minimum power to CPU, etc.). On the other side,
when MA in node gets this packet, it will open the packet
for analysis. If there are sufficient resources to do the
desired job, MA will perform a data processing technique
on its own local database (log file) to obtain some
computation for this job. Some of produced results are as
follows:
•
•
•
•
•
•
•
Average Hit Ratio (AHR): This attribute represents an
average rate of success in all previous times.
Number of all submitted jobs on this node (AAJ).
Number of all jobs submitted at this time, on the
previous days, on this node (AATPJ).
Number of all jobs successfully finished at this time, on
the previous days (NSTP).
Hit Ratio for this time-period on previous days (HRTP).
For example, how many jobs in 1.30 AM o’clock to
2.00 o’clock have been executed?
Average Size of successfully finished jobs (ASF).
Average Response Time for finished jobs (ART).
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
•
•
•
•
•
•
•
•
113
Average Response Time for jobs that have the same
size as the purposed job and have been successfully
finished (ARTSS).
Hit Ratio for the last twenty jobs (HRT).
Date, Time and Size of the last successfully finished job
(LSJ).
Date, Time and Size of the last failed job (LFJ).
Size of the largest successfully finished job (LSI).
Number of all previous jobs that almost have the same
size as the purposed job (ASS). Needless to say, the
size of the previous jobs is not exactly the same as the
size of the desired jobs. For example, for a job with
size=340KB we must find all of the previous jobs
between 1K to 500KB size.
Number of all previous jobs that have the same size as
the purposed job and are successfully finished (NSS).
Moreover, processor speed and CPU availablity
(Idleness) are important for choosing a node.
In addition to the node information, these results will be
sent to GRB from any node. There, GRB will analyze them
to select/deselect the desired nodes. We mention that
always the last collected result will be saved by GRB.
6.2.2
Broker Layer
In this layer we have added two new sections beside
general broker’s sections. The first section is related to
Request Broker section. This section must broadcast packet
to all of the nodes in grid, then it must receive and save the
sent results from each node in temporary XML database
(TXD). Next, Recourse selector section will execute a
Fuzzy decision Tree Algorithms on TXD (gathered result).
114
Advances in Artificial Intelligence Applications
We are doing this task in sub-section inside Resource
Selector that we call FDT executer. Whenever this
algorithm has finished its task, the next sub-section, SNJ
(Selecting node for job), will use the result of the algorithm
to identify suitable nodes.
6.2.2.1 FDT executer.
This section is considered for executing FDT algorithm on
TXD data. As you know, FDT is a machine learning
technique for extracting knowledge that is nearer human
decision. In this research, we have used FDT algorithm
(FID3), because it is reliable and flexible and also has a
high accuracy in selecting samples. All used samples for
both training and testing are extracted from the provided
database (TXD). After that FDT algorithm was performed
by FDT executer, therefore we can select a desired class for
purposed jobs. Also, Jobs can be divided in several groups:
high reliability jobs, real-time jobs, normal jobs, testing
jobs and etc.
6.2.2.2 SNJ sub-layer.
Based on the gathered results from FDT executer, this
section will select appropriate nodes based on job
conditions. There are many parameters in this section, but
the main parameters that must be considered, are as follows:
Very High Reliability jobs: if we want to execute the
desired job successfully with high reliability (response time
is not very important), the AHR, HRTP, ASF, HRT
measures are very important. There is a priority between
these measures. For example, to achieve high reliability,
AHR and then HRTP have a high priority. Of course, other
measures are also important. SNJ will analyze these
measures form gathered results (provided by FDT executer).
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
115
For example, if there are six nodes that have almost the
same AHR and HRTP, or ASF and so on, then other
measures (e.g. ART or HRT) will select to evaluate the
performance of these nodes. It is possible that there are
some states in that SNJ cannot select its own nodes without
limit. For example, suppose that SNJ needs to select seven
nodes for doing the desired tasks, and there exist only five
nodes with high reliability (AHR and HRTP over 95%) and
also, if there exist other nodes with low reliability (less than
50%), then GRB can use other parameters to decreasing
risk. For example, for two remaining nodes, SNJ can
consider HRT parameter, because this is better than other
Random-based methods. All of this will be done by SNJ.
Also it can use multi- versioning in hierarchical
architecture to increase reliability (Bouyer Asgarali et al.,
2007). In other words, it tries to start the versions through
candidate nodes in parallel and distributed form by
dispatching some replicas of an offered job to the bestselected nodes with a special order. For example, to
perform job1, we can use three nodes in hierarchical form
and send replicas of this job to the desired nodes. Thus,
when one of these nodes finishes the related job and sends
its results to GRB truly, then GRB will send a message to
stop and abort this task on other nodes. In this way, fault
tolerance will be improved and so, reliability in finishing
related task will be increased.
Execution in Real Time: if we want to execute a job in
real time status, so the CPU speed and ART have highest
priority and next priority respectively belong to ARTSS,
HRTP, AHR, ASF, and LSI and so on. Also processor’s
power and communication line bandwidth are important. In
this approach we have concentrated on two kinds of jobs
that are mentioned in this section.
116
Advances in Artificial Intelligence Applications
For a fuzzy set, the idea of vagueness is introduced by
assigning an indicator function that may take on values in
the range 0 to 1. The following observations are considered:
•
•
•
•
•
•
Count(Si): returned the number of successfully finished
jobs on nodei
Count(STi): returned the number of successfully
finished jobs in the last 20 submitted jobs on nodei
Count(AAJi): returned the number of all submitted jobs
on nodei
Min(ART): return the minimum ART in between of all
nodes
MAX(ASF): return the maximum ASF in between all
nodes
Min(CPU_SPi)= return the minimum CPU speed in
between all nodes
Suppose that 1≤i≤n (n is showing the number of nodes),
here we mention how to compute or convert deterministic
values to fuzzy sets. Some attributes are have been
computed below (member functions) and they are very
important to decide on selecting nodes:
•
•
•
•
•
•
M(
117
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
•
•
•
As we can see, the A5 shows the ratio of successful jobs
that have similar size with the desired job to all successful
finished jobs. A7 shows the CPU power and A8 shows the
measure of system Idle in fuzzy range.
For the nonce, these nine attributes will be
evaluated in fuzzy behavior. We must mention that based
on the type of jobs, they will take a weight. This weight
has been allocated based on empiric and the effect of each
attributed in classification by DT. These weights are
representing in Table 6.1.
Table 6.1
Name of
attributes
A1 →
A2 →
A3 →
A4 →
A5 →
A6 →
A7 →
A8 →
A9 →
Assign a weight for each attribut
Weight for
High
reliability
WH1=1
WH2=0.4
WH3=0.7
WH4=0.9
WH5=0.5
WH6=0.3
WH7=0.5
WH8=0.4
WH9=0.1
Weight for
Real-time
Weight for
Normal jobs
WR1=0.7
WR2=1
WR3=0.6
WR4=0.4
WR5=0.2
WR6=0.1
WH7=0.9
WH8=0.8
WH9=0.2
WN1=0.8
WN2=0.8
WN3=0.6
WN4=0.6
WN5=0.4
WN6=0.2
WH7=0.5
WH8=0.5
WH9=0.1
118
Advances in Artificial Intelligence Applications
Now, to find a node with very high reliability rather than
other nodes, we should compute the following computation
for each node and then we will select that node with
maximum value.
(8)
We will have a similar method for other type of jobs.
6.3
EXPERIMENTAL RESULTS AND
DISCUSSION
We have designed two applications for our approach. The
first application is executed on nodes (MA). The second
application is designed to implement a new provider for
GRB and will use MA’s result- selects the best nodes for
jobs.
In our experiments, eight resource computing nodes
and one server are used to evaluate performance of this
approach. The hardware information has been described in
Table 6.2. These nodes communicate with server via
internet. Then, the MA application is installed on nodes and
broker provider application is installed on the server
computer. After that, we have started to obtain the samples.
We divide 24 hour into the following parts:
7-9 9-12 12-15 15-17 17-20 20-22 22-24
24-2
2-7
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
Table 6.2
Name
Node1
Node2
Node3
Node4
Node5
Node6
Node7
119
Hardware information
Type of hardware
Pentium4(Cache1MB),CPU2.2(INTEL), RAM 256
Pentium(Cache2MB),CPU2.4(INTEL), RAM 512
INTEL Pentium,CPU3.0(GLI),2, RAM 1G
Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(3.49 GB)
Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(3.49 GB)
Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(3.49 GB)
Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(2.9 GB)
Node8
HP ProLiant ML370 G4 High Performance – Intel Xeon
3.4 GHz (2 processors)L2 cache(RAM 8G)
server
Pentium4(Cache2MB),CPU3.0(INTEL), RAM 1G
When a node is connected to grid (server), right away, MA
will insert a new record into node’s log file.
In the first six days we have used MA Application
but we didn’t use the result of MA in our broker application.
Moreover we always have sent a job for all available nodes.
After that, we have activated broker provider to select only
suitable nodes. Therefore, on the seventh day, we achieved
the following results (see Table 6.3) from the available
nodes in the morning in order to execute a high reliability
job with size 4.47 MB and the execution time of almost 18
minutes. As you see, all of the eight nodes are accessible at
this moment.
120
Advances in Artificial Intelligence Applications
The computed result in 7.00 to 9.00 o’clock
Table 6.3
Node’s
Name
Node1
Node2
Node3
Node4
Node5
Node6
Node7
Node8
A1 A2
0.89
0.87
0.9
0.94
0.95
0.96
0.92
0.80
0.9
0.94
0.92
0.95
0.96
0.95
0.93
1.00
A3
A4
A5
A6
A7
A8
A9
0.9
0.9
0.9
0.95
1.00
1.00
0.90
0.85
0.9
0.85
0.85
1.00
0.90
1.00
0.95
1.00
1.00
0.95
1.00
1.00
1.00
1.00
1.00
1.00
0.55
0.80
0.67
0.90
0.85
0.70
0.85
1.00
0.00
0.21
0.39
0.48
0.48
0.48
0.48
0.78
0.54
0.77
0.97
0.87
0.98
0.16
0.80
0.65
0.02
0.04
0.05
0.43
0.39
0.38
0.28
1.00
Table 6.3 shows us, in A2 column, Node8 is the best and
Node1 is worst (in fuzzy range). When these results have
been delivered to server, broker provider on server side has
selected Node4 for this purpose. Then job was sent to this
node for execution and after a little time, job finished
successfully on Node4 .
The priority list nodes for this job were as following
(high reliability priority for job):
Node4> Node5> Node8> Node7> Node6> Node3> Node2> Node1
If this job has a real-time priority, the below order will be
selected by broker provider Application:
Node8> Node4> Node5> Node7> Node3> Node2> Node6> Node1
On the following days, all measures were based on broker
provider application. After doing 120 measures, we
achieved the following results to execute a job with 10.24
MB and 22 minutes for approximate time. The following
results were sent by each participated nodes at time 12-15:
121
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
Table 6.4
The computed result in 12.00 to 15.00 in
21August (job size 10.24 MB at 12:45 o’clock)
Node’s
Name
Node1
Node3
Node5
Node6
Node7
Node8
A1 A2 A3
0.902
0.908
0.961
0.964
0.920
0.810
0.87
0.93
0.96
0.93
0.94
1.00
0.95
1.00
0.95
0.95
0.95
0.9
A4 A5 A6
A7
A8
A9
0.94
0.86
0.89
0.94
0.92
0.89
0.00
0.39
0.48
0.48
0.48
0.78
0.78
0.92
0.96
0.80
0.93
0.75
0.02
0.05
0.39
0.38
0.28
1.00
0.8
0.8
1.0
0.9
0.6
0.8
0.45
0.53
0.89
0.66
0.91
1.00
As you see, there are only 6 nodes available. This job is
considered as a Real-time job, thus Node8 was selected as
the best node by the proposed application. The selection
priority of nodes is as following:
Node8> Node5> Node6> Node7> Node3> Node1
Figure 6.2
The ratio of successfully finished jobs in
both Random method and our provider
If the job has a very high reliability priority, then the
selection priority will be as below.
Node5> Node8> Node6> Node7> Node3> Node1
122
Advances in Artificial Intelligence Applications
In this method we are choosing the best conditions for job.
Whereas in other methods (for example, random methods)
it is possible to have a high risk to select a node. The Ratio
of successful jobs in our methods is compared with another
random method (Nader and Mohammadreza, 2006) in
Figure 6.2. This shows that our method has a good
performance in stable state.
Finally, we executed a job with execution
time=27min and required RAM= 3.71MB for forty time by
our method and also random method (Nader and
Mohammadreza, 2006) with 1, 2, 4 nodes in order to speed
up evaluation. The obtained results have been represented
in Figure 6.3. As you see, if we have more freedom of
choice, selection will be exacted and so then failure will be
decreased.
Figure 6.3
The ration of speedup Random method and
our method for a const job
The result shows that our approach can achieve better
performance under this strategy. After each measure, it
seems that the ratio of successfully finished jobs have
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
123
improved. It is memorable that for all jobs smaller than
5MB and approximate time less than 2 minutes, almost all
jobs are finished successfully.
6.4
CONCLUSION
Instantaneous Resource Selection for scheduling in
dynamic fashion in the pool of discovered nodes is a
challenging problem. Many methods for this purpose have
been presented but all of them have some restrictions. Our
proposed approach is learning-based, which can use
instantaneous Resource Selection for to increase accuracy
in resources scheduling and reduce extra overhead
communications and faults in the cycle of selection and
computation. This broker provider application along with
MA offers a dynamic decision to access any available and
appropriate nodes by using main important criteria.
The results of our experiments show that this
approach has a better performance than others and it will
operate according to the user’s requirements. Stability is a
helpful characteristic of this approach, so the fault happen
is nearly predictable. As mentioned above, this approach
has a special accuracy in the selecting resources.
Considering this characteristic we recommend this method
is useful in cases in which we have sufficient nodes.
124
6.5
Advances in Artificial Intelligence Applications
REFERENCES
Asgarali, B., Ali, M. and Bahman, A. (2007). A Multi
Versioning Scheduling Algorithm for Grid System
Based on Hierarchical Architecture. In Proceedings
of the 7th IADIS International Conference on
WWW/Internet, Vila Real, Portugal. Oct 2007.
Barbara, C., Babu, S. and Thyagaraja, K. K. (2001). EZGrid: Integrated Resource Brokerage Services for
Computational Grids, http://www.cs.uh.edu/ ezgrid/.
Christophe, M. and Bernadette, B.M. (2003). Choice of a
method for the construction of fuzzy decision trees.
The IEEE International Conference on Fuzzy
Systems, Paris.
Dumitrescu, C. L. and Ian, F. (2005). GRUBER: A Grid
Resource SLA Broker, in Euro-Par, Portugal,
September 2005.
George, K. J. and Bo, Y. (1995). Fuzzy Sets and Fuzzy
Logic: Theory and Applications, Prentice Hall.
Henri, C., Graziano, O., Francine, B. and Richard, W.
(2000). The AppLeS parameter sweep template:
user-level middleware for the grid. In Proceedings
of the ACM/IEEE Conference on Supercomputing,
IEEE Computer Society, 2000.
Ivan, R., Julita, C. M. Rosa, B. and Jesús, L. (2005).
eNANOS Grid Resource Broker, in European Grid
Conference (EGC2005), Springer Berlin/Heidelberg.
James, F., Todd, T., Miron, L., Ian, F. and Steven, T.
(2001). Condor-G: A Computation Management
Agent for Multi-Institutional Grids. In Proceedings
of the 10th IEEE Symposium on High Performance
Distributed Computing (HPDC10), San Francisco,
CA. Aug 2001.
A Mining Based Approach for Selecting Best Resources
Nodes on Grid Resource Broker
125
Janikow, C. Z. (1998). Fuzzy decision trees: Issues and
methods. IEEE Transactions on Systems, Man and
Cybernetics, vol. 28.
Karl, C., Steven, F., Ian, F. and Carl, K. (2001). Grid
information services for distributed resource sharing.
In 10th IEEE Symposium on High Performance
Distributed
Computing,
San
Francisco,
California,August 7-9, 2001.
Klaus, K., Rajkumar, B. and Muthucumaru, M. (2002). A
Taxonomy and Survey of Grid resource
Management Systems. Software Practice and
Experience, 32(2),135-164
Motohide, U., Hirotaka, O., Hiroyuki, T., Fumio, K.,
Umedzu, K. and Junichi, K. (1994). Fuzzy decision
trees by fuzzy id3 algorithm and its application to
diagnosis systems, Department of Systems
Engineering and Precision Engineering, Osaka
University, Japan, IEEE.
Nader, A. and Mohammadreza, M. (2006). A dynamic
methods for searching and selecting nodes in peer to
peer fashion. 10th conference computer science in
Tehran, IKT2006.
Seok, K. Y., Lok, Y. J., Gyoon, H. J., Jinsoo, K. and
Joonwon, L. (2004). Design and Implementation of
an OGSICompliant Grid Broker Service, Proc. of
CCGrid.
Zadeh, A. L. (1984). Making computer think like people,
IEEE spectrum, 26-32
7
RELEVANCE FEEDBACK METHOD
FOR CONTENT-BASED IMAGE
RETRIEVAL
Ali Selamat
Pei-Geok Lim
7.1
INTRODUCTION
The rapid growth of the computer technologies and the advent of
the World-Wide Web have increased the amount and complexity
of multimedia information. In general, an image retrieval system is
a computer system for browsing, searching and retrieving images
from a large database of digital images (Long et al., 2003). Content
based information retrieval (CBIR) uses visual contents to search
images from large scale image databases according to users’
interests. The visual contents of an image such as color, shape,
texture, and spatial layout have been used to represent and index
the image (Long et al., 2003; Crucianu et al.,2004). Recent
retrieval systems have incorporated users’ relevance feedback to
modify the retrieval process in order to generate perceptually and
semantically more meaningful retrieval results.
In the basis of computer centric CBIR systems, the visual
contents of the images are extracted and described by multidimensional feature vectors. These feature vectors can be used to
form a feature database. During the retrieval process, the users
provide the retrieval system with the visual feature(s) and users are
Relevance Feedback Method for Content Based Image Retrieval
127
required to specify the weights for each of the features. Based on
the provided features and specified weights, the retrieval system
tries to find the similar images to the user’s query (Rui et al., 1998).
The similarities or distance between the feature vectors of query
and those of the images in the database are then calculated (Long
et al., 2003). This type of retrieval is performed as an indexing
scheme which provides an efficient way to search for the image
database.
The introduction of relevance feedback method in CBIR is
aims to solve two main problems of CBIR which are subjectivity
of human perception and semantic gap between high level
concepts and low level features (Tao et al., 2006). Regarding to the
human perception issue, we can say that different persons or same
person under different situation may perceive the same visual
content differently (Rui et al., 1998). For an example, one person
may be more interested in an image’s color feature while another
person may be more interested in texture feature on the same
image. Even if both people are interested in texture, the way how
they perceive the similarity of texture may be quite different (Rui
et al., 1998). Therefore, the relevance feedback based CBIR should
intend to capture the user preference.
Although the relevance feedback has improved the CBIR
performance, there appears another problem which the users do not
like to label too many images as feedback to system (Qin et al.,
2008). This can result a set of limited and inaccurate information
being return to the user. In addition, the limited user feedback issue
can cause the insufficiency of training samples which is the images
that labeled by user during the feedback process (Qin et al., 2008;
Tao et al.,2006). Consequently, retrieval accuracy will degrade and
influence the performance of CBIR.
Additionally, how to incorporate positive and negative
examples to refine the query and similarity measure in relevance
feedback can be key issues of CBIR (Long et al., 2003). Hence, a
128
Advances in Artificial Intelligence Applications
classifier or statistical learning technology is needed to classify the
positive samples (relevant images) and the negative samples
(irrelevant images) as two difference groups in the feature space
(Tao et al., 2006; Hong et al., 2000). According to the
classification problem, there are many classification techniques
have been apply to attack relevance feedback tasks such as
Bayesian inference, Boosting, Support Vector Machines (SVM)
and many other statistical learning technologies (Qin et al., 2008).
Among these classifier, SVM based relevance feedback is widely
employed in CBIR. The SVM classifier is capable to learn from
training data which is consists relevant and irrelevant images
marked by users (Zhang et al., 2001). SVM classifier has high
generalization performance without the need to add a priori
knowledge even when the dimension of the input space is very
high (Zhang et al., 2001). Moreover, it also has a good
performance for pattern classification problems by minimizing the
Vapnik-Chervonenkis dimensions and achieving a minimal
structural risk (Tao et al., 2006; Hong et al., 2000).
Even though SVM intensively improve the performance of
relevance feedback-based CBIR, SVM classifier is facing an
unstable problem when the size of training samples is small (Tao et
al., 2006; Kim et al., 2007). As a result, the performance of SVMbased relevance feedback becomes poor when the number of
labelled positive feedback samples is small (Tao et al., 2006; Kim
et al., 2007; Qin et al., 2008). Then, the accuracy of SVM
relevance feedback based CBIR system to retrieves target images
will be decreased. As conclusion, a modified relevance feedback
mechanisms need to be developing in order to increase the
performance of CBIR system.
This chapter consists of 5 sections. Section 1 presents the
introduction of the study and Section 2 gives some literature
reviews on the relevance feedback and CBIR. Proposed
methodology is discussed in section 3 and section 4 discusses the
Relevance Feedback Method for Content Based Image Retrieval
129
experimental results and discussion. The last section will explained
the conclusion and suggestion for future work.
7.2
RELATED WORK
In general, relevance feedback involve the interaction between the
human and computer to refine the high level queries to low level
feature vectors (Rui et al., 1998). High level queries mean the
description that supplied by the user such as keyword or image and
it is understand and meaningful for a user. However, low level
feature vectors is the low-level features that extracted from the user
supplied queries but it is only understand by machines such as
computer. In the past, relevance feedback was used in traditional
text-based information retrieval system. After that, relevance
feedback based approach was introduced to CBIR in order to
attack the semantic gap between high level concept and low level
features and the human perception subjectivity problems in CBIR.
This technique has significantly improved the performance of
CBIR.
Relevance feedback is a process that automatically
adjusting the existing query using the information feedback by the
user about the previously retrieved objects (Rui et al., 1998).
Therefore, the user is required to mark the retrieved images either
relevant or irrelevant to the query and feedback their result to the
system. Then, the system will do retrieval again according to the
user feedback for the purpose to gain a better result. This process
will continues until there is not further improvement in the result
or user satisfied with the result. In fact, this technique do not
require user to provide an precise initial query, but it can help to
estimate the user’s ideal query by using positive (relevant) and
negative (non-relevant) training samples feedback by user.
Therefore, the relevance feedback based CBIR relies on the
relevant and non-relevant examples to reformulate the query.
130
Advances in Artificial Intelligence Applications
According to Qi and Chang, relevance feedback technique
can be classified into three categories which are query reweighting, query shifting and query expansion (Qi and Chang,
2007). Query re-weighting assigns a new weight to each feature of
the query (Qi and Chang, 2007; Rui and Huang, 1999; Qin et al.,
2008; Cheng et al., 2008). Moreover, query shifting and also
known as query refinement moves the query to a new point in the
feature space (Qi and Chang, 2007; Rui et al., 1998; Qin et al.,
2008; Cheng et al., 2008). Both query re-weighting and query
shifting apply a nearest sampling approach to refine query concept
using user’s feedback. Lastly, query expansion uses a multipleinstance sampling approach to learn from the samples around the
neighborhood of positive labeled instances (Qi and Chang, 2007).
Traditional relevance feedback mechanism and some of
other improved relevance feedback mechanisms have been
proposed recently and applied to be used in order to improve the
CBIR performance. For example, MARS is a learning method that
combines the both query vector moving and re-weighting
techniques to estimated the ideal query parameter and learn from
the user feedback. MARS is capable to perform well while the
number of training samples is less than the length of feature vector.
However, it does not effective due to its limited ability to model
non-linear distance function (Rui et al., 1997). The problem of
MARS can be overcome by using Mindreader which can model the
non-linear and quadratic function. Although Mindreader has more
vigorous estimation process than MARS, it only work well if the
number of training samples much more than the length of feature
vector (Ishikawa et al., 1998). A novel relevance feedback
technique which proposed by Rui and Huang is capable to solve
the constrained optimization problem which faced by MARS and
Mindreader (Rui and Huang, 1999). It can achieve best
performance in all the retrieval condition due to its optimal
solutions for query estimation.
Relevance Feedback Method for Content Based Image Retrieval
131
However, most of the previous works that have been done
still does not achieve a satisfaction level. It is also appear the
problems such as the weight adjustment issue, limited user
feedback issue, similarity measurement issue and so on. Cheng et
al. stated that the traditional relevance feedback system adjusted
the weight of each feature (if the visual features of the query is
more than one) that extracted from the query image either
manually adjusted by the user or predetermined and fixed by the
system (Cheng et al., 2008). Therefore, a two-level relevance
feedback method has been proposed to let the user to rank the
images in a relevance order according to their interest (Cheng et al.,
2008). Besides, SCLP (subspance clustering and label propagation)
propose two new units which are representative image selection
and label propagation to incorporate into the traditional relevance
feedback method in order to overcome the problems of insufficient
training sample and limited user feedback (Qin et al., 2008). Even
though there is abundant of effort have been contributed by the
researchers from all over the world, there is still a space to improve
the performance of CBIR.
7.3
METHODOLOGY
In this section, the proposed methodology will be discussed.
Basically, the relevance feedback based CBIR process is started by
user who is request to provide an image query to the system. After
that, the system will analyst the query image and retrieves a set of
ranked images according to a similarity metric. Then, a relevance
feedback process will take part where it is one way of identifying
what the user looking for in current retrieval session by including
the user in the retrieval loops (Crucianu et al., 2004). In relevance
feedback process, user will be request to mark the images either
relevant or irrelevant to their query. The labeled images will be
feedback to the system, and the labeled information will be user to
improve the result for next retrieval. Therefore, a revised ranked
132
Advances in Artificial Intelligence Applications
list of image will be presented to the user in the next iteration (Das
and Ray, 2007). The process of relevance feedback based CBIR
will be show in Figure 7.1.
Figure 7.1
Process of relevance feedback based CBIR.
Relevance Feedback Method for Content Based Image Retrieval
Figure 7.2
133
Proposed methodology.
The proposed methodology which shows in Figure 7.2 consists of
three main parts which are pre-processing, relevance feedback and
classification. There are two units which are representative image
selection and weight ranking have been added into the traditional
134
Advances in Artificial Intelligence Applications
relevance feedback mechanism. Generally, a traditional relevance
feedback based CBIR method consists of image retrieval, user
labeling and classification units. The added two units are intends to
solve some of the problems which are limited information from
user, weight adjustment and similarity measurement issues.
Representative image selection unit can help the system to selects a
set of informative images from image database which can reflect
the user interest. However, the weight ranking unit can help the
system to determine which feature method, for examples: color or
texture, is more concern by user. Hence, the system can adjust the
weight of each feature more precisely and indirectly improve the
accuracy of CBIR. Further detail of preprocessing, similarity
measurement, relevance feedback and classification will be
explained in the following sub section.
7.3.2
Preprocessing
In this study, both query image and database images will be preprocessed to filter the unnecessary noises. After that, the images
will be resized to 128X128 dimensions in order to reduce the
complexity computation. Continually, the processed images will be
go through two main steps in preprocessing which are image
segmentation and feature extraction. This preprocessing step will
transform the images which understand by human to another form
that can understand by the machines such as computer.
Firstly, the image will be segmented into several regions by
using image segmentation technique. In this study, the pixel
labelling technique based on Gaussian Mixture Models (GMM)
which also known as MAP (maximum a posteriori approach)
segmentation (Blekas et al., 2005) will be used. By using the MAP
segmentation, the image color will cluster and quantized to several
classes (Liu et al., 2008). At the end of segmentation, a class-map
of a particular image can be obtained (Liu et al., 2008). Figure 7.3
shows the step that the image being segmented into several regions
by using MAP segmentation.
Relevance Feedback Method for Content Based Image Retrieval
135
As shows in the Figure 7.4, each class label in class-map
will be classify as a region. Therefore, there are six regions for six
class label in the Figures 7.3 and 7.4. In feature extraction step, the
low level images features such as color and texture will be
extracted from each image region. For each region, the color
features is the average RGB values of all the pixels in that
particular region which is also the dominant color feature for that
region (Liu et al., 2008). Therefore, there are three dimension of
color feature will be extracted for each region. Figure 7.4 shows
the color features that have been extracted for each region within
the particular image.
Figure 7.3
The process of MAP (maximum a posteriori
approach) segmentation.
In general, the images features selection is a very fundamental
issue in designing a content-based image retrieval system. There
are no a single feature can perfectly represent the whole content of
an image. Therefore, the combination of two or more features is
best representing the images content. However, this study will
chose two types of the features which is color and texture features
to represent the image regions in order to reduce the complexity
issues.
136
Advances in Artificial Intelligence Applications
After that, texture features for each region will be extracted using
haar wavelet filter and Discrete Wavelet Transform (DWT). In this
study, the texture features will be extracted from arbitrary-shaped
regions (Liu et al., 2008).
Figure 7.4
image.
Color features for each region within the particular
The process of texture feature extraction for each particular region
is shows as follows:
1. Scale-down a given 2-D image to four sub images which its
wavelets in three orientations which are horizontal, vertical and
diagonal (Zhang et al., 2001).
2. Decomposes the image up to four level decomposition using
haar wavelet filter and DWT. According to Zhou et al. (Zhou et
al., 2000), a three or four level of decomposition is expected to
extract more robust features.
Relevance Feedback Method for Content Based Image Retrieval
137
3. Calculate the mean, μ mn and standard derivation, σ mn of the
transform coefficients in each sub images (Manjunath et al.,
2001). The mean and standard variation values will be use as
texture feature of the region. Below shows the equation of
mean, μ mn and standard derivation, σ mn (Manjunath et al.,
2001):
μ mn
= ∫ ∫ Wmn ( x, y ) dxdy
σ mn
=
(1)
∫ ∫ (Wmn ( x, y ) − μ mn ) dxdy
2
(2)
where m = Scales of image
n = Orientation of image
Wmn ( x, y ) = Energy coefficients of filtered image output
4. Combine the wavelet channel as shows in Figure 7.5 for
rotation-invariant and scale-invariant texture criteria.
5. Quantized the coefficients
combination channels.
from
wavelet
channels
or
6. Calculate the linear sums of channel’s coefficient for rotationinvariant and scale-invariant texture criteria as shows in Figure
7.5.
The channel combination descriptor is defined as description of
arbitrary combinations of wavelet channel. The main concept of
channel combination description is linearly sum up all the channel
energies. Besides, it is also assume that it can represent the texture
in more compact representation and retain more important
information about the texture (Ohm et al., 2000). Figure 7.5 shows
the process of extracting the texture features which are mean and
standard derivation and the rotation-invariant and scale-invariant
138
Advances in Artificial Intelligence Applications
by using channel combination descriptor. Figure 7.5 (a) shows the
channel groups for rotation-invariant ( - - - ) and scale-invariant
) criteria. Figure 7.5 (b) shows the process which to combine
(
the wavelet channels 10+11+12, 7+8+9, 4+5+6 and 1+2+3 to fulfil
the rotation-invariant criteria and wavelet channels 1+4+7+10,
3+6+9+12, and 2+5+8+11 to fulfill the scale-invariant texture
criteria. Lastly, the Figure 7.5 (c) shows quantized and up
sampling the channel’s energies. There are 31-dimension of texture
feature have been extracted from each region.
As a result, there is a total of 34-dimension feature for each
region will be extracted. All extracted features will be used for
feature similarity measure.
7.3.3 Feature Similarity Measure
This sub section will measure the similarities among images by
comparing the differences between the features. In CBIR, the
features such as color and texture feature represent the content of
an image. Then, feature can used to showed how similar between
two images. Hence, the similarity comparison is performed based
on the visual content descriptors which are DWT and RGB feature
extraction technique. Besides, it is a needed to define a suitable
dissimilarity measure formula to compute the region-based images
(Greenspan et al., 2000; Liu et al., 2008). In this study, Earth
Mover’s Distance (EMD) will be used as feature similarity
measurement. EMD is used to compute the dissimilarity between
sets of regions and returns the correspondence between them
(Greenspan et al., 2000).
IQ
=
{(R
)
}
{(R , w )i = 1,..., n} ,
Given a query image I Q with m segmented regions as
i
Q
, wQi i = 1,..., m and target image I T with n segmented
regions as I T
=
i
T
i
T
where RQi and RTj are ith
and jth region of the query image and target image. Meanwhile,
Relevance Feedback Method for Content Based Image Retrieval
139
Figure 7.5
Texture features for each region within the
particular image (Ohm et al., 2000).
140
Advances in Artificial Intelligence Applications
wQi and wTj are the weight of the regions. For the iteration of
image retrieval, the weights of regions are defined as ratio size of
the region size to the image size (Liu et al., 2008). The initial
1
weight of color and texture feature is
where N is the total of
N
image feature. Below shows the formula of EMD distance
measurement:
∑ ∑
∑ ∑
EMD(I Q , I T ) =
m
i =1
m
i =1
n
j =1 ij
n
v d ij
(3)
v
j =1 ij
where d ij is the ground distance between region RQi and RTj . It is
{
{R , G , B , μ
}
also Euclidean distance which calculate the dissimilarities between
the
region
features,
i
i
i
i
i
i
and
FQi = RQi , GQi , BQi , μ 00
,...,
,
c
,...,
c
σ
μ
σ
32 Q 32 Q
7Q
Q 00 Q
FTj
=
d ij
= d FQi , FT j
j
T
(
j
T
(R
= wc
+ wt
j
T
)
i
Q
− RTj
3
2
σ 00j T ,..., μ 32j T σ 32j T , c j ,..., c7jT }.
1Q
j
00T
1T
) + (G
2
i
Q
∑∑ [e] + ∑ (c
− GTj
7
s =0 o =0
k =1
i
kQ
) + (B
−c
2
i
Q
)
2
j
kT
where wc = weight of color feature.
wt = weight of texture feature.
(
i
j
− μ soT
e = μ soQ
) + (σ
2
i
soQ
j
− σ soT
)
2
− BTj
)
2
(4)
Relevance Feedback Method for Content Based Image Retrieval
141
It will be determined by minimizing EMD(I Q , I T ) subject to the
following constraints: (Liu et al., 2008)
vij which stated in formula EMD is the weight that assigned to d ij .
∑v
≥ 0;
vij
j =1
m
ij
≤ wQi ,
1 ≤ i ≤ m;
i =1
m
ij
≤ wTj ,
1 ≤ j ≤ n;
n
∑v
∑∑ v
n
i =1 j =1
ij
n
⎛ m
⎞
= min⎜⎜ ∑ wQi , ∑ wTj ⎟⎟
j =1
⎝ i =1
⎠
(5)
The above constraints shows that the minimum weight among both
of the query region and target region will be taken as weight that
assigned to d ij which is calculating the distance between region
features.
At the end of feature similarity measure, a ranked list of
EMD distance between database images and query image will be
generated. Ideally, the comparison between region features can
help the system to retrieve the images which contain the same
concept as the query image. Therefore, the smaller distance
between regions of query image and target image shows the closer
concept between them.
7.3.4 Relevance Feedback
As mentioned above, relevance feedback mechanism will be
repeated iteratively until the retrieval result is satisfied by the user.
In relevance feedback parts, it consists of five units which
including image retrieval, representative images selection, user
labeling, weight ranking, and SVM learning and classification.
142
Advances in Artificial Intelligence Applications
7.3.4.1 Image Retrieval
A ranked EMD distance list of database images is the input for this
unit. However, there are two different types of outputs from this
unit. In the beginning of image retrieval, top N images from the
ranked list will be displayed for user labeling process where N is
amount of images that needed to output for user to mark. For the
next iteration, a set of estimated possibly positive image set
(EPPIS) will be selection and outputted. EPPIS set is a dynamic
image collection which is sub image set that consistent
characteristic. Moreover, EPPIS contains those images which are
closest to query point in the beginning of retrieval. Thus, those
images which are classified as positive with high confidence will
be included in EPPIS. This is done for the sake of the proposed
new units which is representative image selection (Qin et al., 2008).
7.3.4.2 Representative Image Selection
Representative image selection unit is aims to select a set of
representative image from the EPPIS set and retrieve these selected
representative image for user labeling process. As a result, the total
retrieved images for user labeling is the total amount for the
representative image set. A representative image is a subset of
EPPIS which have the minimum information loss and do not
contain too much redundancy. Below shows the definition of the
representative image set (Qin et al., 2008):
R = arg min
Y
{d (EPPIS ,Y )Y ⊆ EPPIS , N
Y
= NR
}
(6)
where N Y = Number of elements in Y
N R = Number of elements in representative image set R
Basically, all the images including the user query image and
database images will be partition into the perceptive of features
which are regions and its color and texture features. In
Relevance Feedback Method for Content Based Image Retrieval
143
representative image selection, the informative image will be
selected from the EPPIS set which the selected images has the
smallest distance to the query image. Figure 7.6 shows the
algorithms of representative image selection. Before that, there are
few assumptions need to be considered:
1. There is possible for a region such as region 1 in query image,
I Q has the same concept as in the region 2 in target image, I T1
and region 1 in the target image, I T2 . Therefore, each region in
query image needs to be compared to every region of all target
images from EPPIS set.
2. By using the Euclidean measurement for region features, the
target image which has the smallest distance of region features
will be included in representative image set. For example, if
target image, I T1 which its color feature of region 3 having the
smallest distance to color feature of region 1 for query image
compare to other images in EPPIS, then, the target image, I T1
will be taken as representative image.
3. If we select the representative images from EPPIS
independently based on the concept as stated as sentence 2
above, there will appear the case which the same representative
image have been selected from two different region of query
image. In order to avoid this problem, the element-element
distance, d (m ) will be used to check whether the new selected
representative image is close enough to any representative
images in R (m ) where m is the current total of representative
images. If it is close, it will be deleting and the selection of
representative image in that query region will repeat again. As
result, the second smaller distance of target image to the query
image will be selected as representative image. Furthermore,
this process will be continued until the selection of
representative image set becomes stable.
144
Advances in Artificial Intelligence Applications
Figure 7.6
The algorithm of the representative image selection.
Relevance Feedback Method for Content Based Image Retrieval
145
4. Each region and its corresponding features have been assigned
to their particular weight. It is mean that each of the regions
was not equally important. Thus, the selection of representative
images will depend on the assigned weight. As example,
suppose we have two regions with weight 0.6 and 0.4
separately, and we need to select 10 representative images from
EPPIS. Therefore, 0.6X10=6 images need to select from region
1 and 0.4X10=4 images need to select from region 2 of query
image.
7.3.4.1 User Labeling
In user labeling process, the user will mark the retrieved
representative images either relevant or irrelevant according to
their desired target images. The labels will be feedback to the
system for further analysis. In this study, user is require to give a
sequence of images which known as image ranking sequence and
this sequence will be feedback to the CBIR system. Similarly, user
will rank a sequence of relevant images with respect to their
similarity to query image (Cheng et al., 2008). In user labeling unit,
there are two type of user-defined preference relation which are >
and = will be used by user to show the image ranking sequence. As
example, if the similarity degree of I T1 and I T2 are same, then it will
denote I T1 = I T2 . Meanwhile, if the similarity degree of I T1 is more
than I T2 , then it will denote I T1 > I T2 . In the other words, the leftmost
to rightmost of image ranking sequence showing the most similar
to less similar to the user desired image.
According to Cheng et al. (2008), there are some of criteria
shows that this image ranking sequence will give more meaningful
feedback to CBIR system. These criteria are:
1. User able to provide more information about the relevant
image by using image ranking sequence. Therefore, we can get
more information with the same number of judged examples.
146
Advances in Artificial Intelligence Applications
As a result, the iterations of relevance feedback process can be
reduced (Cheng et al., 2008).
2. User can easily weight the various features which are regions
and region features based on their interest. It is difficult for
user to define the value of similarity, but user can easily
distinguish which images are more similar than the others
(Cheng et al., 2008). Thus, user will rank the similarity degree
for each relevant image in image ranking sequence.
3. From the user image ranking sequence, we can estimate which
feature methods are closer to user’s interest. After that, we can
analyze how each feature method is close to image ranking
sequence that responded by user. The weight of each feature
will be adjusted according to the analyst of each feature
method (Cheng et al., 2008).
7.3.4.1 Weight Ranking
In weight ranking unit, the CBIR system will obtain the label for
each of the representative image that has been retrieved in user
labeling process. The images that return in the image ranking
sequence will be labeled as relevant images and the rest will be
label as irrelevant images. After that, system will analysis from the
relevant images and find the region areas that similar in each of
them. Next, system will calculate and update the weights for region
and region features by using the formula as shows at below.
For the region features which are color and texture feature,
the weight adjustment process will use the following formula 7
(Qin et al., 2008). For any given query images, their corresponding
weight of region features is F = {wc , wt } where wc is weight for
color feature and wt is weight of texture feature.
Relevance Feedback Method for Content Based Image Retrieval
wf
=
γ (f )
147
1
∑γ ( )
F
k =1
(7)
1
k
where γ ( f ) = ∑ x∈L , y∈L (d ( f ) ( x, y )) and d ( f ) is the element-element
2
distance metric for region features among those
relevant labeled images.
Each region of images will be rank in the ranking sequence
by system according to their distance to the query image.
Therefore, the ranking sequence generated by a method is closer to
ranking sequence responded by the user if its feature is closer to
user’s opinion.
Let assume, if the ranking sequence for region 1 is i4> i3>
i2> i1 and ranking sequence for region 2 is i1> i2> i4> i3. Ranking
sequence responded by user is i1> i2> i3> i4 where i is the index
of database images. Therefore, the weight of region 1 will decrease
and weight for region 2 will increase during weight adjustment
process. According to Cheng et al. (Cheng et al., 2008), the
importance of each region is based on the comparison of sequence.
For adjust the weight of regions, the formula 8 will be used to
evaluate how close of two sequence (Cheng et al., 2008).
(
Rnorm Δsystem , Δuser
)
=
1⎛ S+ − S−
⎜1 +
+
2 ⎜⎝
S max
⎞
⎟⎟
⎠
(8)
where Δsystem = rank ordering of labeled relevant images induced by
the similarity values computed by an image
retrieval system.
Δuser = rank ordering of labeled relevant images by user.
148
Advances in Artificial Intelligence Applications
S + = number of image pairs where a better image is ranked
ahead of a worse one by Δsystem .
S − = number of image pairs where a worse image is
ranked ahead of a better one by Δsystem .
+
S max
= maximum possible number of S + from Δuser
According to Cheng et al. (Cheng et al., 2008), the Rnorm
calculation method is based on the ranking of images pairs in
Δsystem relative to ranking of corresponding images pairs in Δuser .
The range of Rnorm values is from 0 to 1 and value 1 indicates that
the rank ordering of system is same as user provided ranking
(Cheng et al., 2008). As a result, Rnorm can represents which part of
region in the query image that user pay attention. For a given query
image I Q with m segmented regions, the system will estimate Rnorm
of each region, Rnorm = (r1 , r2 ,..., rm ) . The new weight for each
region will defined in formula 9.
wQi =
∑ j ==1 r j
ri
m
(9)
7.3.4.2 Support Vector Machine (SVM) Learning and
Classification
Support Vector Machine (SVM) has been use to classify the
unlabeled images of image database into two classes which are
relevant and irrelevant. First of all, the user labeled images will be
used to train the SVM learning machine. After that, the entire
database images that not yet label by user will be used in SVM
classification process. In this study, two-class SVM will used to do
classification purpose. The images which classified as relevant
images will be retrieved for next iteration. The feature distance
Relevance Feedback Method for Content Based Image Retrieval
149
measurement information will be used as index values of samples
image in training and classification process. The features distance
measurement information is the features distance of labeled or
unlabeled sample image to query image.
The relevance feedback mechanism will be repeated
iteratively until the retrieval result is satisfied by the user. In other
words, system has successfully formulated the user interest area
and correctly re-weighting the weight of each region and its region
features.
7.4
EXPERIMENT SETUP
In this section, we will describe the experiment and result in details.
In this study, the experiment used five categories of database
images which are animal, building, flower, fruits and natural scene.
The image database has total 1000 images and each category
consist of 200 images. Table 7.1 shows the categories of image
database. The main purpose of these dataset are used for evaluate
the performance of proposed relevance feedback based CBIR
system. Table 7.2 shows the hardware specification that will used
in the experiment and Table 7.3 shows the default parameters that
will be used in the experiments.
In this study, the information retrieval measurement which are
precision, recall and F1 will be used to evaluate the performance of
CBIR. Precision (P), recall (R) and F1 are three standard
measurements that commonly used to evaluate the quality of the
results in information retrieval domain. Basically the value of
precision measures the exactness of the system performance
whereas recall measures the completeness of the data that being
retrieved by the system. F1 is the measurement that evaluates the
balance of precision and recall. The formula of these three standard
information retrieval measurements is shows as below. The detail
150
Advances in Artificial Intelligence Applications
of explanation for following formulas will be shown in the Table
7.4.
Table 7.1
Categories of image database.
Category
1
2
3
4
5
Category Name
Animal
Building
Flower
Fruits
Natural Scene
Table 7.2
Amount
200
200
200
200
200
Hardware specification that used in experiment.
No
Hardware
1
Processor
2
Memory (SDRAM 667Mhz)
3
Operating System
Specification
Intel Centrino Dual Core
DDRII 2GB
MS Windows Vista
Table 7.3
Default parameters that will be used in the
experiments.
Condition
Number of retrieved
image
Number of EPPIS set
Number of iteration
Number
of
training
sample for SVM
Number of classification
samples for SVM
Type of SVM
Parameter
20 images
100 images
6 iteration
Entire labeled images.
Unlabeled images = Entire
database images – entire labeled
images.
Binary
SVM
with
default
parameter setting for linear kernel.
Relevance Feedback Method for Content Based Image Retrieval
Table 7.4
151
Description of information retrieval measurement
Human Yes No
System
a
b
Yes
No
precision(P ) =
recall (R ) =
F1 =
c
a
(a + b )
a
(a + c )
2 PR
(P + R )
d
(10)
(11)
(12)
7.4.2 Experiment Result and Discussion
In this sub section, the experiment results will be discussed.
Figures 7.7 to 7.9 shows the precision, recall and F1 results of
proposed relevance feedback method for five categories of image
database which are animal, building, flower, fruits and natural
scene. However, Figure 7.10 shows the average performance of
proposed relevance feedback based CBIR for five categories of
image database.
According to Figure 7.7, the fruits category shows the highest
precision in almost all the retrieval iteration. However, the fruits
categories shows decreasing pattern from first iteration to the sixth
iteration. This may be due to there is several categories of fruits
such as apple. Orange, banana, and so on have been collected
under the fruits category. Therefore, the results of fruits category
152
Advances in Artificial Intelligence Applications
become unstable. The flower category shows an increasing pattern
from first to sixth iteration. This is because the flower images that
have been collect having the similar and consistent behavior.
Precision Rate for Five Categories of Images
6
Iteration
5
Natural Scene
Fruits
Flower
Building
Animal
4
3
2
1
0
0.2
0.4
0.6
Precision Rate
0.8
1
Figure 7.7
Precision rate for five categories of image using the
proposed relevance feedback method.
Obviously, the recall and f1 rate shows an increasing pattern for all
of the categories of image database which can be illustrated in
Figures 7.8 and 7.9. At the same time, the flower and fruits
categories shows higher rate among these five categories. The
main cause is almost all the images under category fruits and
flower having the consistent behavior and it does not contain the
unnecessary noise especially the unwanted background and objects
that does not describe the category.
According to Figure 7.10, the experiment results indicate that the
recall and F1 rate of proposed method is increasing from first
retrieval to sixth iteration. Hence, it could be conclude that the
proposed relevance feedback method has achieved a better
performance after sixth iteration. The increasing pattern from first
Relevance Feedback Method for Content Based Image Retrieval
153
to sixth iteration as shown in Figure 7.10 can conclude that the
proposed relevance feedback based CBIR system is capable to
select the positive images from the set database images.
As a result, the representative image selection unit is able to
choose the significant and informative positive images from
database images rather than just follow the result of similarity
measurement. Besides, the experiment results also show that the
weight ranking unit is capable to re-weight the weight of features
more precisely in a more proper way. Hence, the proposed method
is able to retrieve more relevant images against the increasing of
relevance feedback iterations. Lastly, the experiments also show
that the incorporation of representative image selection and weight
ranking units is capable to increase the performance of CBIR
system.
Recall Rate for Five Categories of Images
6
Iteration
5
Natural Scene
Fruits
Flower
Building
Animal
4
3
2
1
0
0.1
0.2
0.3
Recall Rate
0.4
0.5
Figure 7.8
Recall rate for five categories of image using the
proposed relevance feedback method.
154
Advances in Artificial Intelligence Applications
F1 Rate for Five Categories of Images
6
Iteration
5
Natural Scene
Fruits
Flower
Building
Animal
4
3
2
1
0
0.1
0.2
0.3
F1 Rate
0.4
0.5
0.6
Figure 7.9
F1 rate for five categories of image using the
proposed relevance feedback method.
Average Rate of Information Retrieval Measurement for
Proposed Method
0.6
0.5
0.4
Rate 0.3
0.2
0.1
0
Precision
Recall
F1
1
2
3
4
5
6
Iteration
Figure 7.10 Average precision, recall and F1 rate for five
categories of image database.
Relevance Feedback Method for Content Based Image Retrieval
7.5
155
CONCLUSION
Based upon the experiment results, it show that the proposed
method achieve better performance after sixth iteration of retrieval.
The experiments also proof that by solving the limited user
feedback and weight adjustment issue, the performance of CBIR
could be improved. Hence, the proposed method is capable to
solve the CBIR problems.
However, the performance of proposed method still does
not achieve a outperform result. Therefore, a lot of efforts still
need to put in order to rapidly increase the performance of CIBR
and reduce the number of iteration that need to use in relevance
feedback process. There is potential to have improvement in
performance if better preprocessing method is applied instead of
only using using MAP segmentation (Blekas et al., 2005), haar
filter (Smith and Chang, 1996; Manjunath et al., 2001) with DWT
and wavelet channel combination (Ohm et al., 2000) technique.
Therefore, preprocessing method could be considered as one of the
future study since the performance of CBIR may vary if different
preprocessing methods are applied. Thus, there will have more
significant data could be extracted if the more significant feature
extraction technique could be used. In conclusion, the performance
of proposed method may be further improved if more feature
extraction method such as such as shape, spatial layout and
geometry features is being used.
156
7.6
Advances in Artificial Intelligence Applications
REFERENCES
Blekas, K., Likas, A., Galatsanos, N. and Lagaris, I. (2005). A
Spatially-Constrained Mixture Model for Image
Segmentation. IEEE Transactions on Neural Networks, vol.
16(2), 494-498.
Cheng, P. C., Chien, B. C., Ke, H. R. and Yang, W. P. (2008). A
two-level relevance feedback mechanism for image
retrieval. Expert Systems with Applications, Vol. 34, Issue
3, 2193-2200.
Crucianu, M., Ferecatu, M. And Boujemaa, N. (2004). Relevance
feedback for image retrieval: a short survey. Report of the
DELOS2 European Network of Excellence (6th Framework
Programme).
Das, G. and Ray, S. (2007). A comparison of relevance feedback
strategies in CBIR. Computer and Information Science,
2007. ICIS 2007. 6th IEEE/ACIS International Conference,
100-105.
Greenspan, H., Dvir, G. and Rubner, Y. (2000). Region
Correspondence for Image Matching via EMD Flow. In
Proceedings of the IEEE Workshop on Content-Based
Access of Image and Video Libraries (Cbaivl'00) (June 16 16, 2000). IEEE Computer Society, Washington, DC, 27.
Hong, P. Y., Tian, Q. and Huang, T. S. (2000). Incorporate
support vector machines to content-based image retrieval
with relevance feedback. Proceedings 2000 International
Conference Image processing, Vol. 3, 750-753.
Ishikawa, Y., Subramanya, R. and Faloutsos, C. (1998).
Mindreader: Query database through multiple examples.
Proceedings of the 24th VLDM Conference, New York.
Kim, D. H., Song, J. W., Lee, J. H. and Choi, B. G. (2007).
Support Vector Machine Learning for region-based image
retrieval with Relevance Feedback. ETRI Journal. Vol. 29,
Number 5, 700-702
Relevance Feedback Method for Content Based Image Retrieval
157
Liu, Y., Zhang, D. S. and Lu, G. (2008). Region-Based Image
Retrieval with High-Level Semantics using Decision Tree
Learning. Pattern Recognition, 41(8), 2554-2570.
Long, F., Zhang, H. J. and Feng, D. D. (2003). Multimedia
Information Retrieval and Management: Technological
Fundamentals and Applications, chapter Fundamentals of
Content-Based Image Retrieval. Springer-Verlag, Berlin.
Manjunath, B., Wu, P., Newsam, S. and Shin, H. (2001). A texture
descriptor for browsing and similarity retrieval. Signal
Processing Image Communication, 16:33-43.
Ohm, J., Bunjamin, F., Liebsch, W., Makai, B., Müller, K., Smolic,
A. and Zier, D. (2000). A Set of Visual Feature
Descriptors and their Combination in a Low-Level
Signal Processing: Image
Description Scheme.
Communication 16, 157-179.
Qin, T., Zhang, X. D., Liu, T. Y., Wang, D. S., Ma, W. Y. and
Zhang, H. J. (2008). An active feedback framework for
image retrieval. Pattern Recognition Letters, Vol. 29, Issue
5, 637-6461.
Qi, X. J. and Chang, R. (2007). Image retrieval using transactionbased and SVM-based learning in relevance feedback
session. M. Kamel and A. Campilho (Eds): ICIAR 2007,
LNCS 4633, 638-649 @ Springer-Verge Berlin Heidelberg.
Rui, Y. and Huang, T. S. (1999). A novel relevance feedback
techniques in Image retrieval. In: Proceedings 7th ACM
Conference on Multimedia, 67–70.
Rui, Y., Huang, T. S. and Mehrotra, S. (1997). Content-based
image retrieval with relevance feedback in MARS.
Proceedings IEEE International Conference on Image
Processing, Vol. 2, 815-818.
Rui, Y., Huang, T. S., Ortega, M. and Methrotra, S. (1998).
Relevance feedback: A power tool in interactive contentbased image retrieval. IEEE Transactions On Circuits and
Systems for Video Techonology, 8(5), 644-655.
Smith, J. R. and Chang, S. F. (1996). Automated binary texture
feature sets for image retrieval. Proceedings ICASSP-96.
158
Advances in Artificial Intelligence Applications
Tao, D. C., Tang, X. O., Li, X. L. and Wu, X. D.
(2006) .Asymmetric Bagging and Random Space for
Support Vector Machines-Based Relevance Feedback in
Image Retrieval. IEEE Transactions Pattern Analysis and
Machine Intelligence, vol.28, n07.
Zhang, L., Lin, F. and Zhang, B. (2001). Support Vector Machine
Proceedings IEEE
Learning for Image Retrieval.
International Conference Image Processing, vol. 2, 721724.
Zhou, S. H., Venkatesh, Y. V. and Ko, C.C. (2000). Wavelet-based
Texture Retrieval and Modeling Visual Texture Perception.
Master of Engineering Thesis. National University of
Singapore (NUS).
8
DETECTING BREAST CANCER USING
TEXTURE FEATURES AND SUPPORT
VECTOR MACHINE
Al Mutaz Abdalla
Safaai Deris
Nazar Zaki
8.1
INTRODUCTION
Localized textural analysis of breast tissue on mammograms has
recently gained considerable attention by researchers studying
breast cancer detection. Despite the research progress to solve the
problem, detecting breast cancer based on textural features has not
been investigated in depth. In this work, we study the breast cancer
detection based on statistical texture features using Support Vector
Machine (SVM). A set of textural features was applied to a set of
120 digital mammographic images, from the Digital Database for
Screening Mammography. These features are then used in
conjunction with SVMs to detect the breast cancer tumor.
8.2
RELATED WORK
Breast cancer is the second causes of death for women around the
world. Any average woman has one chance in eight (or about 12%)
of developing breast cancer during her life. Early detection of
160
Advances in Artificial Intelligence Applications
breast cancer by means of screening mammography has been
established as an effective way to reduce the mortality rate
resulting from breast cancer (Smith, 1995; Tabar,1995). Despite
significant recent progress, the recognition of suspicious
abnormalities in digital mammograms still remains a difficult task.
There are at least several reasons for that. First, mammography
provides relatively low contrast images, especially in the case of
dense or heavy breasts. Second, symptoms of abnormal tissue may
remain quite subtle. For example, speculated masses that may
indicate a malignant tissue within the breast are often difficult to
detect, especially at the early stage of development (Arodz, 2005).
The recent use of localized textural and machine learning
(ML) classifiers has established a new research direction to detect
breast cancer. Localized texture analysis of breast tissue on
mammograms remains an issue of major importance in mass
characterization. However, in contrast to other mammographic
diagnostic approaches, it has not been investigated in depth, due to
its inherent difficulty and fuzziness (Mavroforakis, 2006).
Many studies have been focused on general issue of
textural analysis on mammographic images, in the context of
detection of the boundary of tumors and micro calcifications
(Lisboaa, 2002; Arivazhagan, 2003). However, none of these
studies has taken in consideration the classification of normal,
benign, and malignant cases all together. In this work, we study the
classification of a total of 120 digital mammographic images
contain 60 normal, 30 benign, and 30 malignant cases.
8.3
MATERIALS AND METHOD
A set of statistical texture feature functions was applied to a set of
120 digitized mammograms in specified regions of interest. The
Detecting Breast Cancer Using Texture Features and Support
Vector Machine
161
measurements are made on co-occurrence matrices in two different
directions.
In the first step the digitized sample consists of 120
mammographic images originating from the Digital Database for
Screening Mammography (DDSM) (Kwok, 2003) has been
randomly selected (similar data will be used to test the
performance of different machine learning techniques).
The Digital Database for Screening Mammography
(DDSM) is a resource for use by the mammographic image
analysis research community. The database contains approximately
2,500 studies. Each study includes two images of each breast,
along with some associated patient information. One hundred
twenty cases were selected randomly from the DDSM with
different age groups ranged from (35-80) to form this dataset.
Figure 8.1
segmented
Digitized Mammogram showing one manually
The selected cases contain 60 normal, 30 benign, and 30 malignant
digitized cases at 50 micron meter and 12 bit gray level. In the
second stage the region of interest has been selected which contain
the suspicious Region of Interest (ROI) as shown in Figure 8.1. In
162
Advances in Artificial Intelligence Applications
the third stage, the feature selected from the ROI and statistical
texture features are calculated for each ROI.
In Figure 8.2, we illustrate the main steps to classify the
extracted texture.
Figure 8.2
Diagram showing the main steps in texture analysis
and classification
Detecting Breast Cancer Using Texture Features and Support
Vector Machine
8.4
163
FEATURE EXTRACTION
The implementation of feature extraction procedure relies on the
texture, which is the main descriptor for all the mammograms. In
this work, we concentrate on statistical descriptors that include
variance, skewness, and Spatial Gray Level Dependence Matrix
(SGLD) or co-occurrence matrix for texture description. These
features are then used in conjunction with SVM to separate the
three classes from each other. In the proceeding sub-sections, we
will illustrate how these statistical descriptors are actually
calculated.
8.4.1
Gray Level Histogram Moments (GLHM)
Two gray level sensitive histogram moments are extracted from
the pixel value histogram of each region of interest (ROI) and
defined as follows:
Variance:
σ2 =
X Y
1
∑∑ [I ( x, y) − μ ]
( XY − 1) x =1 y =1
2
Skewness:
1
XY
⎡ I ( x, y ) − μ ⎤
∑∑
⎥⎦
⎢
σ
x =1 y =1 ⎣
X
Y
(1)
3
(2)
Where X and Y denote the number of gray levels in the
mammogram. I ( x, y ) denote the image sub-region pixel matrix,
and µ is the mean of the matrix I ( x, y ) .
164
8.4.2
Advances in Artificial Intelligence Applications
Spatial Gray Level Dependence Matrix (SGLDM)
The co-occurrence probability provides a second order method for
generating texture features. This probability is called in another
terminology as Spatial Gray Level Dependency (SGLD) and it
represents the conditional joint probabilities of all pairwise
combinations of gray levels in the spatial window of interest given
two parameters interpixel distance (d) and orientation angle ( φ ).
The probability measure can be defined as:
⎧ C ⎫
Pr ( x) = ⎨ ij ⎬
⎩ (d ,φ ⎭
(3)
Where Cij is the co-occurrence probability between gray levels i
and j .
Texture features based on the spatial co-occurrence of pixel
values are probably the most widely used in the analysis of digital
image. First proposed by Haralick et al. (1973) the characterize
texture using a varity of quantities derived from second order
image statistics. Statistics are applied to the Gray Level Cooccurrence Matrix (GLCM) to generate texture features which are
assigned to the center pixel of the image window. In this work, five
statistical measures are extracted using the following nomenclature
where the element p (i, j ) represents the frequency of occurrence
between two gray levels, i and j for a given vector.
Angular second moment (ASM): Measure the number of
repeated pairs
∑∑ p(i, j )
N
N
i =1 j =1
2
(4)
Inverse difference moment: Inform about the smoothness of the
image
Detecting Breast Cancer Using Texture Features and Support
Vector Machine
∑∑ ⎢1 + ( j − j )
N
N
i =1 j =1
⎡
⎣
1
2
165
⎤
p (i, j )⎥
⎦
(5)
Entropy: Measure the randomness of a gray level distribution
∑∑ [ p(i, j ) log( p(i, j ))]
N
N
i =1 j =1
(6)
Correlation:
⎡
⎤
⎢∑∑ (ij ) p(i, j )⎥ − μ x μ y
⎣
⎦
σ xσ y
(7)
Where μ x , μ y , σ x , σ y are the means and standard deviations of the
density function. The probability p (i, j ) provides a correlation
between the two pixels in the pixel pair
Contrast:
⎧⎪ N g N g
⎫⎪
n ⎨∑∑ p(i, j )⎬
∑
⎪⎩ i =1 j =1
⎪⎭
n =0
N g −1
2
8.5
(8)
SUPPORT VECTOR MACHINE
SVM is a powerful classification algorithm and well suited the
given task (Cristianini and Tylor, 2002; Vapnik,1998). It addresses
the general problem of learning to discriminate between positive
and negative members of a given class of n -dimensional vectors.
166
Advances in Artificial Intelligence Applications
The algorithm operates by mapping the given training set into a
possibly high-dimensional feature space and attempting to learn a
separating hyperplane between the positive and the negative
examples for possible maximization of the margin between them
(Zaki, Deris, and Alashwal, 2006). The margin corresponds to the
distance between the points residing on the two edges of the
hyperplane as shown in Figure 8.3.
w.x + b = 0
w ⋅ x + b = +1
w⋅ x +b =−1
x2
x1
Figure 8.3
Illustration of the hyperplane separation between
the positive and negative example in support vector machine
Having found such a plane, the SVM can then predict the
classification of an unlabeled example. The formulation of the
SVM is described as follows:
Suppose our training set S consists of labeled input
vectors ( xi , yi ) , i = 1...m where xi ∈ ℜ n and yi ∈ {±1} . We can specify
a linear classification rule f by a pair ( w, b) , where the normal
vector w ∈ ℜ n and the bias b ∈ ℜ , via
Detecting Breast Cancer Using Texture Features and Support
Vector Machine
f ( x) = ( w, b) + b
167
(9)
where a point x is classified as positive if f (x) > 0 . Geometrically,
the decision boundary is the hyperplane
{x ∈ ℜ n : ( w, x) + b = 0}
(10)
The idea makes it possible to efficiently deal with vary high
dimensional futures spaces is the use of kernels:
for all x, z ∈ X ... (11)
K ( x, z ) = φ ( x ) ⋅ φ ( z )
where φ is the mapping from X to an inner product feature space.
We thus get the following optimization problem:
max ∑ λi −
m
λ
i =1
Subject to the constraints λi ≥ 0
1 m
∑ λi λ j yi y j K ( xi , x j )
2 i , j =1
∑λ y
m
i =1
i
i
=0
(12)
(13)
Given the labeled feature vectors, we can learn SVM classifier to
separate the normal, benign, and malignant classes from each other.
The appeal of SVMs is twofold. First they do not require any
complex tuning of parameters, and second they exhibit a great
ability to generalize given small training samples. They are
particularly amenable for learning in high dimensional spaces. In
this particular implementation, we used the LibSVM software
implemented by Chih-Chung Chang and Chih-Jen Lin. The
software
is
available
for
download
at
http://www.csie.ntu.edu.tw/~cjlin/libsvm-tools/. We primarily
employed the Gaussian kernel which utilizes the radial basis
functions. The Gaussian Radial Basis Function (RBF) is used as it
168
Advances in Artificial Intelligence Applications
allows pockets of data to be classified which is more powerful way
than just using a linear dot produce. RBF is computed using the
following equation:
⎛ || x − x || ⎞
k ( xi , x j ) = exp⎜⎜ − i 2 j ⎟⎟
2σ
⎠
⎝
(14)
Where σ is the scaling parameter.
One of the significant parameters needed to tune our
system is the soft-margin parameter or the capacity. The softmargin parameter C allows us to control how much tolerance for
error in the classification of training samples we allow (Zaki, Deris,
and Alashwal, 2006)]. The soft-margin parameter C and the cross
validation values are set to 100 and 10 respectively.
8.6
RESULTS
The overall results of classification obtained for the 120 image
dataset are summaries in Figure 8.4. Performances of other
Machine Learning (ML) techniques such as Linear Discriminant
Analysis (LDA), Non-linear Discriminant Analysis (NDA),
Principal Component Analysis (PCA) and Artificial Neural
Network (ANN) were also shown in Figure 8.4. The results show
that, SVM was able to achieve a better accuracy of 82.5%.
We used a two-tailed t-test to compare how significant is
the difference in accuracies between SVM and other ML
techniques. The results are presented below in Table 8.1. They
represent the p-values from, 2-tailed t-test. As shown in Table 8.1,
there are performance differences between SVM and all the other
four methods. Table 8.1, also shows that, the performance of SVM
is significantly superior to LDA, PCA and ANN (in bold) at a
threshold of 0.05. However, there is no statistically significance
Detecting Breast Cancer Using Texture Features and Support
Vector Machine
169
difference in performance accuracies between SVM and LDA at a
same threshold value.
Accuary (%)
100%
80%
40%
82.50%
81.70%
60%
68.30%
63.30%
74.20%
20%
0%
LDA
NDA
PCA
ANN
SVM
Machine Learning Techniques
Figure 8.4
Comparison of different ML techniques
Table 8.1
Statistical significance differences between pairs of
ML techniques
NDA
PCA
ANN
SVM
LDA 0.1711 0.04910 0.1046 0.1779
NDA
8.7
0.1230
0.0677 0.0070
PCA
0.0557 0.1299
ANN
0.0747
CONCLUSION
Texture analysis is a promising tool for clinical decision making
and one of the most valuable and promising area in breast tissue
analysis. Several factors affect its performance and are still not
170
Advances in Artificial Intelligence Applications
completely understood. In this study we analyzed the region of
interest on screening mammograms in order to provide radiologist
an aid for estimation of tumors. Texture analysis was performed on
small ROIs. Five COM measures were calculated from each ROI.
The use of the statistical textural features in conjunction with SVM
delivered more accurate results of 82.5%. In conclusion we can
suggest that texture analysis can contribute to computer aided
diagnosis of breast cancer. Completion of the proposed method
should include a larger dataset and investigation of additional
classification schemes.
8.8
REFERENCES
Arivazhagan, S. and Ganesan, L. (2003). Textural classification
using wavelet transform, Pattern Recognition Letters . pp.
1513--21.
Cristianini, N., Shawe, J. and Taylor. (2000). An introduction to
Support Vector Machines, Cambridge, UK: Cambridge
University Press.
Haralick, R. M., Shanmugam,K. and Dinstein, I. (1973).Textural
features for image classification, IEEE Trans Sys Man
Cyb. , pp. 610--21.
Kwok, J. T., Zhu, H. and Wang, Y. (2003). Texture classification
using the support vector machines, Pattern Recognition. pp.
2883--93.
Lisboa, P. G. (2002). A review of evidence of health benefit form
artificial neural networks, Neural Networks. pp. 11--39.
Mavroforakis, M. E., Georgiou, H. V., Dimitropoulos, N.,
Cavouras, D. and Theodoridis, S. (2006). Mammographic
masses characterization based on localized texture and
dataset fractal analysis using linear, neural and support
vector machine classifiers, Artificial Intelligence in
Medicine. pp. 145--162
Detecting Breast Cancer Using Texture Features and Support
Vector Machine
171
Smith, R. A. (1995). Screening women aged 40--49: where are we
today? Journal of the National Cancer Institute. pp. 11981199.
Tabar, L., Fagerber, G. and Chen, R. H. (1995). Efficacy of breast
screening by age: new results from the Swedish two
country trial, Cancer.pp. 1412--1419.
Vapnik, V. N. (1998). Statistical Learning Theory, John Wiley and
Son. New York.
Zaki, N. M., Deris, S. and Alashwal, H. (2006). Protein Protein
Interaction Detection Based on Substring Sensitivity
Measure, International Journal of Biomedical Sciences. pp.
148—154.
INDEX
A
activation function, 13, 31
adaptive learning, 40, 45, 61
aircraft, 36, 48, 51
ANFIS, 63, 64, 66, 67, 70, 71, 75, 76, 77,
78, 79
ARIMA, 20, 21, 22, 27, 28, 29, 30, 31,
32, 33, 34, 35, 80, 81
autoregressive, 20, 26, 66
B
back‐propagation, 2, 31, 65, 67, 74
Backpropagation, 15, 35, 60
black‐box, 2
BPNN, 2, 3, 6, 11, 14, 15, 17
breast cancer, 159, 160, 170
C
characteristics, 1, 2, 3, 4, 5, 7, 11, 13,
17, 21, 23, 26, 66, 67, 72, 92
classification, 1, 2, 3, 14, 15, 17, 20, 37,
51, 53, 58, 60, 63, 91, 95, 103, 106,
107, 117, 128, 133, 141, 148, 150,
160, 162, 165, 166, 168, 170
classification accuracy, 14, 15
clustering, 71, 83, 84, 86, 87, 88, 89, 90,
91, 93, 94, 95, 96, 98, 99, 100, 101,
131
conjugate gradient, 42, 43, 44, 45, 46
content‐based, 135, 156, 157
crisp inputs, 68, 69
D
Diagnostic checking, 23
E
e‐learning, 7
empirical results, 34
epoch, 40, 43, 46, 54, 55, 96, 97
Error, 14, 15, 56
F
feature extraction, 49, 134, 135, 136,
138, 155, 163
Felder Silverman, 2, 3, 4, 5, 6
forecasting, 20, 21, 22, 24, 29, 31, 32,
33, 34, 63, 64, 71, 72, 73, 75, 76, 78,
79, 80
fuzzy decision tree, 106, 107
fuzzy inference, 64, 66, 68, 78, 80
G
Gaussian, 69, 75, 76, 77, 85, 134, 167
genetic algorithm, 82, 83, 84, 91, 92,
93, 94, 96, 99
Genetic algorithm
Genetic Algorithm, 91, 92
Geometric Moment, 47, 48
grid computing, 102
Grid computing, 102
Index
H
Hessian matrix, 42, 43, 59
heuristic, 39, 40, 60
hidden, 11, 13, 15, 23, 24, 25, 26, 30,
36, 38, 39, 52, 65, 74, 75
human cognition, 23
hybrid, 17, 21, 26, 27, 28, 31, 33, 34,
67, 68, 71, 83
hyperplane, 166, 167
173
N
Neural, 1, 6, 14, 15, 16, 17, 18, 19, 20,
24, 34, 35, 60, 61, 81, 100, 101, 156,
168, 170
neural network, 4, 13, 20, 23, 30, 31,
62, 63, 65, 66, 73, 78, 79, 80, 82, 83,
84, 93, 101
nonlinear mapping, 26, 30, 66
normalization, 29, 73, 93
O
I
input neurons, 15, 30, 31, 73
intra‐class, 49
optimization problem, 41, 130, 167
output neurons, 15
P
J
Jacobian matrix, 43, 59
L
lead time, 32
learning dimension, 5, 6, 8, 9, 12, 13,
17
learning resources, 2, 8, 11
learning style, 2, 3, 4, 7, 9, 10, 11, 14,
15, 17
local minimum, 37, 45
M
Machine Learning, 95, 100, 156, 158,
168
Mamdani, 68
mammograms, 159, 160, 163, 170
mean absolute error, 72, 79
mean square error, 29, 37
MSE error, 52, 54, 55
Pattern, 61, 99, 157, 158, 170
pattern recognition, 6, 20, 36, 63, 82
prediction, 22, 27, 32, 33, 78, 79, 80,
103
Q
quasi‐Newton, 41, 42, 43, 46
R
Radial Polynomials, 47
Region of Interest, 161
relevance feedback, 126, 127, 128, 129,
130, 131, 132, 133, 141, 146, 149,
151, 152, 153, 154, 155, 156, 157
representative image, 131, 133, 142,
143, 144, 146, 153
rough set, 83, 84, 86, 88, 89, 90, 91, 93,
95, 98, 99
174
Advances in Artificial Intelligence Applications
S
SOM, 82, 83, 84, 85, 86, 93, 94, 95, 96,
97, 98, 99, 101
Spatial, 163, 164
spatial layout, 126, 155
stochastic, 37
Sugeno‐Takagi, 68
supervised learning, 2, 61, 71
SVM, 128, 141, 148, 150, 157, 159, 163,
165, 166, 167, 168, 169, 170
synaptic, 37
T
texture features, 135, 136, 137, 142,
159, 162, 164
two‐tailed t‐test, 168
U
U‐matrix, 86
uncertainty, 83, 93, 96, 98, 99
V
vagueness, 116
W
wavelet filter, 136
Z
Zernike Moment, 36, 47, 50
Download