First Edition 2008 © PUTEH SAAD & HISHAMMUDDIN ASMUNI 2008 Hak cipta terpelihara. Tiada dibenarkan mengeluar ulang mana-mana bahagian artikel, ilustrasi, dan isi kandungan buku ini dalam apa juga bentuk dan cara apa jua sama ada dengan cara elektronik, fotokopi, mekanik, atau cara lain sebelum mendapat izin bertulis daripada Timbalan Naib Canselor (Penyelidikan dan Inovasi), Universiti Teknologi Malaysia, 81310 Skudai, Johor Darul Ta’zim, Malaysia. Perundingan tertakluk kepada perkiraan royalti atau honorarium. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical including photocopy, recording, or any information storage and retrieval system, without permission in writing from Universiti Teknologi Malaysia, 81310 Skudai, Johor Darul Ta’zim, Malaysia. Perpustakaan Negara Malaysia Cataloguing-in-Publication Data Advances in artificial intelligence applications / chief editor: Puteh Saad ; editor Hishammuddin Asmuni. ISBN 978-983-52-0623-8 1. Artificial intelligence. I. Puteh Saad. II. Hishammuddin Asmuni. 006.3 Editor: Puteh Saad & Rakan Pereka Kulit: Mohd Nazir Md. Basri & Mohd Asmawidin Bidin Diatur huruf oleh / Typeset by Fakulti Sains Komputer & Sistem Maklumat Diterbitkan di Malaysia oleh / Published in Malaysia by PENERBIT UNIVERSITI TEKNOLOGI MALAYSIA 34 – 38, Jln. Kebudayaan 1,Taman Universiti 81300 Skudai, Johor Darul Ta’zim, MALAYSIA. (PENERBIT UTM anggota PERSATUAN PENERBIT BUKU MALAYSIA/ MALAYSIAN BOOK PUBLISHERS ASSOCIATION dengan no. keahlian 9101) Dicetak di Malaysia oleh / Printed in Malaysia by UNIVISION PRESS SDN. BHD Lot. 47 & 48, Jalan SR 1/9, Seksyen 9, Jalan Serdang Raya, Taman Serdang Raya, 43300 Seri Kembangan, Selangor Darul Ehsan, MALAYSIA. CONTENTS PREFACE............... ……………………………………………..…VII CHAPTER 1 ......................................................................................... 1 NEURAL NETWORK FOR CLASSIFYING STUDENT LEARNING CHARACTERISTICS IN E-LEARNING NOR BAHIAH HJ AHMAD SITI MARIYAM HJ SHAMSUDDIN CHAPTER 2 ....................................................................................... 20 A HYBRID ARIMA AND NEURAL NETWORK FOR YIELDS PREDICTION RUHAIDAH SAMSUDIN ANI SHABRI CHAPTER 3 ....................................................................................... 36 A PERFORMANCE STUDY OF ENHANCED BP ALGORITHMS ON AIRCRAFT IMAGE CLASSIFICATION PUTEH SAAD NURSAFAWATI MAHSHOS SUBARIAH IBRAHIM RUSNI DARIUS CHAPTER 4 ....................................................................................... 63 ANFIS FOR RICE YIELDS FORECASTIN RUHAIDAH SAMSUDIN PUTEH SAAD ANI SHABRI vi Contents CHAPTER 5 ....................................................................................... 82 HYBRIDIZATION OF SOM AND GENETIC ALGORITHM TO DETECT OF UNCERTAINTY IN CLUSTER ANALYSIS E. MOHEBI MOHD. NOOR MD. SAP CHAPTER 6 ..................................................................................... 102 A MINING-BASED APPROACH FOR SELECTING BEST RESOURCES NODES ON GRID RESOURCE BROKER ASGARALI BOUYER MOHD NOOR MD SAP CHAPTER 7 ..................................................................................... 127 RELEVANCE FEEDBACK METHOD FOR CONTENT-BASED IMAGE RETRIEVAL ALI SELAMAT PEI-GEOK LIM CHAPTER 8 ..................................................................................... 159 DETECTING BREAST CANCER USING TEXTURE FEATURES AND SUPPORT VECTOR MACHINE AL MUTAZ ABDALLA SAFAAI DERIS NAZAR ZAKI INDEX 172 PREFACE Various authors provide their own views for Artificial Intelligence (AI). Russel and Norvig summarize AI to be embedded in an agent that is able to think humanly and rationally and act humanly and rationally. Due to its wide coverage, AI is divided into the following branches; natural language processing, knowledge representation, automated reasoning, machine learning, computer vision and robotics. In this book, machine learning is applied to solve a myriad of problems occurring in various domains ranging from e-learning, prediction, bioinformatics, content-based retrieval, data mining, image recognition and grid computing. A number of algorithms are developed in machine learning. The algorithms covered in this volume are; Artificial Neural Network (ANN), Fuzzy Logic (FL), Genetic Algorithm (GA), and Support Vector Machine (SVM). ANN is suitable to solve problems related to learning and adaptation. FL is suited to solve decision making problems that have imprecise knowledge. GA is utilized to solve an optimization problem that is categorized as NP-Hard. SVM is an excellent classifier for high dimensional data. There are many ANN algorithms available. They can be categorized into supervised, unsupervised learning and reinforcement learning. Each type of learning is suited to different applications. In supervised learning, the learning tasks consist of two phases; training and testing. The learning rule used here is called an Error Correction technique or also known as a Gradient Descent technique. The generalization capability achieved during the training phase is tested during the testing phase by replacing the input samples with new data. Back Propagation algorithm is a favourite among researchers since it is effective and is widely used to solve various kinds of problems. In this volume, Back Propagation (BP) algorithm is utilized to solve the classification of the student learning styles based on their learning preferences and behavior in Chapter 1. Chapter 2 displays its potentiality is solving yield forecasting problem. BP can viii Preface also solve image classification problem as reported in Chapter 3. On the other hand, in unsupervised learning the network by itself will explore the correlations between patterns in the data and organizes the patterns into classes based from the correlations without being trained. The common learning rules implemented in unsupervised learning are Hebbian rules and Competitive rules. Selforganising Map (SOM) is an example the algorithm that implements competitive rules. Chapter 5 highlights the capability of SOM integrated with genetic algorithm to detect uncertainty in cluster analysis. Genetic Algorithms are directed random search techniques used to look for parameters that provide an optimal solution to a NPHard or NP-Complete problem. GA begins with a set of solutions (represented by chromosomes) called population. Solutions from one population are taken and used to form a new population. Solutions which are then selected to form new solutions (offspring) are selected according to their fitness, the more suitable they are the more chances they have to reproduce. Chromosomes are filtered iteratively using mutation and crossover operators until chromosomes that fulfill a desired objective are found. In Chapter 5, GA is combined with SOM to detect uncertainty in cluster analysis. Fuzzy logic is a set of mathematical principles for knowledge representation based on degrees of membership. Unlike two-valued Boolean logic, fuzzy logic is multi-valued. It deals with degrees of membership and degrees of truth. FL uses the continuum of logical values between 0 (completely false) and 1 (completely true). Chapter 6 illustrates the utilization of FL in constructing fuzzy decision tree to select best resource nodes in grid computing. Support Vector Machines (SVMs) are a set of related supervised learning methods used for classification and regression. SVM will construct a hyperplane in an n-dimensional space separating two sets of input vectors that maximizes the margin Preface ix between the two data sets. To calculate the margin, two parallel hyperplanes are constructed, one on each side of the separating hyperplane, which are "pushed up against" the two data sets. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the neighboring datapoints of both classes, since in general the larger the margin the better the generalization error of the classifier. Chapter 7 and 8 describe the application of SVM to solve two different problems. Chapter 7 uses SVM to provide relevance feedback to the user for content-based retrieval of images. Chapter 8 classifies a cancerous data samples based on the texture features. Puteh Saad Hishammuddin Asmuni Faculty of Computer Science and Information System Universiti Teknologi Malaysia 2008 1 NEURAL NETWORK FOR CLASSIFYING STUDENT LEARNING CHARACTERISTICS IN E-LEARNING Nor Bahiah Hj Ahmad Siti Mariyam Hj Shamsuddin 1.1 INTRODUCTION Neural Network (NN) is an information processing paradigm that is inspired by the way of biological nervous system, such as the brain that process information received through the senses in the human body. It has been used widely in many applications such as automotive, aerospace, banking, medical, robotics, electronic and transportations. NN is able to learn complex non-linear input-output relationships and is adaptive to environment. NN has been extensively used for user modeling, mainly for classification and recommendation in order to group users with the same characteristics and create profile (Frias-Martinez et al., 2005). Some examples are Bidel, Lemoine, and Piat (2003) which use NN to classify user navigation paths, Stathacopoulou et al., (2006) and 2 Advances in Artificial Intelligence Applications Villaverde et.al (2006) which uses NN to assess student’s learning style. A problem which arises when trying to apply NN to model human behavior is knowledge representation (Stathacopoulou et al., 2006). The black-box characteristics of NN cannot help much since the weights learned are often difficult for human to interpret. To alleviate the situation, back-propagation network (BPNN) which is a supervised learning algorithm reduced the global error produced by the network over the weight space. This chapter discusses the implementation of BPNN to represent and detect students’ learning styles in a web-based education system. A learning system that provides learning resources according to Felder Silverman (FS) learning style has been developed and tested on University Technology Malaysia (UTM) students taking Data Structure subject. In this chapter, we describe the classification of the student learning styles based on their learning preferences and behavior. Learning style has become a significant factor contributing in learner progress (Magoulas et al., 2003) and has become an important consideration while designing an on-line learning system. It is important to diagnose the students learning style because some students learn more effectively when taught with personalised methods. Information about the learning style can help the system become more sensitive to the differences of students using the system. Understanding learning styles can improve the planning, producing, and implementing of educational experiences, so they are more appropriately tailored to Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 3 students’ expectations, in order to enhance their learning, retention and retrieval (Carver and Howard, 1996). Typically questionnaires are used to diagnose learning characteristics. However, the use of questionnaires is time consuming and unreliable method for acquiring learning style characteristics and may not be accurate (Villaverde et al., 2006; Stash and de Bra, 2004; Kelly and Tangney, 2004). Once the profile is generated, it is static and does not change regardless of user interaction. Nevertheless, student’s learning characteristics changes while given different task in on-line learning environment. In the first section of the chapter, we describe Felder Silverman learning style model. Then the process of capturing and analyzing the student behavior while learning using hypermedia is outlined. The subsequent section discusses the analysis of the distribution of the learners learning style, preferences and their navigation behavior. The classification of students based on the integration of FS features using BPNN is described. The chapter concludes with a brief discussion and main conclusion drawn from the experiment conducted. 1.2 FELDER SILVERMAN LEARNING STYLE MODEL Felder-Silverman (FS) learning style model was developed by Felder and Silverman in 1998. This model categorized a student’s dominant learning style along a scale of four dimensions: active-reflective (how information is processed), sensing-intuitive (how information is perceived), visual-verbal (how information is presented) 4 Advances in Artificial Intelligence Applications and global–sequential (how information is understood). Table 1.1 describes the characteristics of students based on the learning dimensions. The model has been successfully used in previous studies involving adaptation of learning material, collaborative learning and for traditional teaching (Felder and Silverman, 1998; Zywno, 2003; Carmo et al., 2007). Furthermore, the development of the hypermedia learning system that incorporate learning components such as the navigation tool, the presentation of the learning material in graphics form, simulation, video, sound and help facilities can easily tailor to the FS learning style dimension. Felder and Solomon developed Index of Learning Styles (ILS) questionnaire to identify the students learning style. The objective of these questionnaires is to determine the dominant learning style of a student. (active-reflective, sensing-intuitive, visual-verbal, and sequential-global). This study integrate the processing, perception, input and understanding learning styles to map the characteristics of the students into 16 learning style. Table 1.2 lists the 16 learning styles that are proposed in Integrated Felder Silverman (IFS) model. The rationale of the integration of these dimensions is to minimize the time consumption in diagnosing the learning styles. Previous research such as the work done by (Villaverde et al., 2006; Kelly and Tangney, 2004; Garcia et al., 2006; Graf and Kinshuk, 2006; Yaanibelli et al.,2006 and Lo and Shu,2005) attempted to detect learning styles based on the student behavior and interaction while learning on-line. Various techniques have been used to represent student models, such as neural network (Villaverde et al., 2006 and Lo and Shu, 2005), Genetic Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 5 Algorithm (Yaanibelli et al., 2006), FS measurement (Graf and Kinshuk, 2006), fuzzy logic, bayesian networks (Kelly and Tangney, 2004 and Garcia et al., 2006), and case-based reasoning. Table 1.1 Felder Silverman learning dimension and learner characteristics (Felder and Silverman,1988) Dimension Learner Characteristics Active Reflective Retain and understand Prefer observation rather than Processing information best by doing active experimentation. Tend to something active with it think about information quietly such as discussing it, first. applying it, or explaining it to others. Sensor Intuitive Like learning facts, often Prefer discovering possibilities like solving problems by and relationships. Like well-established methods innovation and dislike repetition. Perception and dislike complications Better at grasping new concepts and surprises. Patient and comfortable with with details and good at abstractions and mathematical memorizing facts and formulations. Tend to work faster doing hands-on work. and more innovative than More practical and sensors. careful than intuitors. Visual Verbal Remember best what they More comfortable with verbal Input see from visual information such as written texts representations such as or lectures. graphs, chart, pictures and diagrams. Sequential Global Prefer to access well Prefer to learn in large chunks, Understanding structured information absorb material randomly sequentially, studying without seeing connections and each subject step by step. then suddenly getting it. Able to solve complex problems quickly or put things together in novel ways once they have grasped the big picture. 6 Advances in Artificial Intelligence Applications Table 1.2 Sixteen learning styles based on Felder Silverman learning dimension Learning Styles Label 1. Active/Sensor/Visual/Sequential ASViSq 2. Reflective/Sensor/Visual/Sequential RSViSq 3. Active/Intuitive/Visual/Sequential AIViSq 4. Reflective/Intuitive/Visual/Sequential RIViSq 5. Active/Sensor/Verbal/Sequential ASVbSq 6. Reflective/Sensor/Verbal/Sequential RSVbSq 7. Active/Intuitive/Verbal/Sequential AIVbSq 8. Reflective/Intuitive/Verbal/Sequential RIVbSq 9. Active/Sensor/Visual/Global ASViG 10. Reflective/Sensor/Visual/Global RSViG 11. Active/Intuitive/Visual/Global AIViG 12. Reflective/Intuitive/Visual/Global RIViG 13. Active/Sensor/Verbal/Global ASVbG 14. Reflective/Sensor/ Verbal/Global RSVbG 15. Active/Intuitive/Verbal/Global AIVbG 16. Reflective/Intuitive/Verbal/Global RIVbG The novelty of our study is to classify the propose IFS features of students learning styles by employing BPNN technique. Neural network has been chosen for the following reasons: (a) Their pattern recognition ability of imprecise or not fully understood data. (b) Their ability to generalise and learn from specific examples Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 7 (c) Their ability to be updated quickly with extra parameters (d) Their speed of execution, which makes them ideal for real time applications. 1.3 METHODOLOGY In order to determine which characteristics of the students can be used to identify their learning style, we have conducted an experiment on 115 UTM students. Among them 75 are Computer Science students and 40 students are Computer Engineering students who took Data Structure subject. During the study, the students were required to attend lectures, participated in lab exercises, worked in group to solve problem given, self-studied using e-learning system, participated in forum discussion and took on-line quiz. All materials can be accessed through a hypermedia learning system which was integrated in the learning system provided by UTM. During the study, the students were required to answer 2 questionnaires; ILS questionnaire and questionnaire to get the feedback regarding the system used and the student’s preferences of the learning material. 1.4 DEVELOPMENT OF THE LEARNING SYSTEM When designing the learning material, it is important to accommodate elements that reflect individual differences in learning (Bajraktarevic, 2003). Systems such as iWeaver (Wolf, 2003), CS383 (Carver and Howard, 1996), (Graf and Kinshuk, 2005), INSPIRE (Papanikolau et al., 2003) and SAVER (Lo and Shu, 2005) proposed several learning 8 Advances in Artificial Intelligence Applications components to be tailored according to learning styles of the students. We adopted the components implemented in iWeaver, CS383 and INSPIRE in our system due to the success of the researches. The learning resources are structured into components that are suitable for processing, perception, input and understanding learning dimension. Among the resource materials provided in the learning systems are as follows: Forum – Provide mechanism for active discussions among students Animation – Provide simulations of various sorting technique. The process of how each sorting technique is implemented can be viewed step by step according to the chosen algorithm. Sample Codes – Provide source codes that student can view and actively execute various sorting programs Hypertext – Provide learning content which is composed into theory and concepts. The learning content consists of topic objectives, sub modules, and navigation link. Power Point Slideshow – Provide example and description in the form of text, pictures and animations. Exercises – Designed in multiple choice questions which students can answer and get hint and feedback regarding their performance. Assessment on-line – An on-line quiz that consist of multiple choice questions and marks that can be displayed immediately after the student submit the quiz. Table 1.3 lists the learning resources and the learning styles that match the resources. 9 Neural Network for Classifying Student Learning Learning Characteristics in E-Learning Table 1.3 Learning resources developed based on FS learning dimension Dimension Learning Approaches Active Processing Perception Input Understanding 1.5 Post and reply forum Exercise Simulation Code execution Reflective View Forum Hypertext access Sensor Intuitive # of backtrack in hypertext concrete material (Hypertext) access to example Exercises Exam delivery duration Exam revision Abstract material (Hypertext) Visual Verbal Simulation Simulation coverage Code execution Hypertext PPt Slide Sequential Global Hypertext – navigate linearly Hypertext coverage # of visiting Course overview ANALYSIS OF THE QUESTIONNAIRE The purpose of the ILS questionnaire is to determine the learning style of UTM students. Figure 1.1 shows the distribution of Learning Styles collected from 115 students 10 Advances in Artificial Intelligence Applications taking Data Structure subject in UTM. They were required to fill up the ILS questionnaire in order to determine their learning styles. From the survey, we found out that there are only 14 learning styles exist among the students. Majority of the students have Active/Sensor/Visual/ Sequential learning styles followed by Reflective/ Sensor/Visual/Sequential. The result is consistent with the studies done by (Zywno, 2003) who concluded that the default learning styles is Active/Sensor/Visual/Sequential. Figure 1.1 students Distribution of learning styles among UTM However, in this study we found out that no students fall into 2 categories of learning style, which are Active/ Intuitive/ Verbal/ Sequential and Reflective/ Sensor/ Verbal/ Global. The main reason that the 2 learning style are absent is due to that there are only twelve students who have verbal learning styles which are not enough to cover the 16 learning styles. Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 11 The second questionnaire attempts to get feedback from the students regarding the system and their preferences. Table 1.4 summarizes the preferences of the learning resources based on their learning style. 1.6 CLASSIFICATION USING NEURAL NETWORK In this study, we used Back-propagation Neural Network to model the learning style dimension. BPNN is generally composed of an input layer, several hidden layer and an output layer. When the network is given an input, the updating of activation value propagates forward from the input layer of processing unit through each internal layer to the output layer of processing units. The output unit then provides the network’s response. When the network corrects its internal parameters, the correction mechanism starts with the output units and back propagates backward through each internal layer to the input layer. Backpropagation can adapt two or more layer of weights and uses more sophisticated learning rules. Its power lies in its ability to train hidden layer. This initial study attempts to explore the feasibility of BPNN on classifying student learning styles. The first experiment classified the student’s characteristics into four FS dimension. The second experiments conducted classify the student’s characteristics into integrated FS dimension. The results of the two experiments are compared in order study the performance of the experiments. 12 Advances in Artificial Intelligence Applications Table 1.4 List of attributes and values for Felder learning dimension Label Attribute Name Value A1 Post and reply forum Much/Few A2 # of Exercise visited Much/Few A3 # of Simulation visited Much/Few A4 # of executing Codes Much/Few A5 # of viewing/reading forum Much/Few A6 Hypertext coverage Much/Few A7 # of backtrack in hypertext Much/Few A8 concrete material (Hypertext) Much/Few A9 abstract material (Hypertext) Much/Few A10 access to example Much/Few A11 # of Exercise visited Much/Few A12 Exam delivery duration Quick/slow A13 Exam revision Much/Few A14 # of Simulation visited Much/Few A15 # of diagram/picture viewing Much/Few A16 Hypertext coverage Much/Few A17 PowerPoint Slide Access Much/Few A18 Hypertext – navigate linearly Linear/global A19 Hypertext coverage Much/Few A20 # of visiting Course overview Much/Few Dimension Active/ Reflective Sensor/ Intuitive Visual/ Verbal Sequential /global Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 1.7 13 DATA DESIGN AND KNOWLEDGE REPRESENTATION From the analysis of the learning component preferences of the students, we simulated the data that represents the characteristics of the students based on the learning styles. Table 1.4 lists the attributes for FS learning dimension. There are 20 attributes identified which will be mapped into 16 learning styles as listed in Table 1.1. Attributes A1 – A6 were used to identify Active/Reflective learners, attributes A7 – A13 were used to identify Sensor/Intuitive learners, attributes A14 – A17 were used to identify Visual/Verbal and attributes A18 – A20 were used to identify Sequential/Global learners. 1.8 EXPERIMENTS AND RESULTS We simulated data based on the attributes and values listed in Table 1.4 that represents the characteristics of students based on FS learning model. During the experiment, 80% of the data was used for training and 20% of the data was used for testing. In designing BP neural network, we take into account parameters involve such as number of training data, number of hidden layers, number of processing units in input layer, hidden layer and output layer. Table 1.4 lists the input variable for each learning dimension, while Table 1.5 shows the structure of the neural network being trained on the simulated data. In this study, we selected sigmoid transfer function since it is a better choice for classifying problem (Paridah et al., 2001). We implemented 1/(1+e-x) as an activation function. We then run the neural network 14 Advances in Artificial Intelligence Applications on several unseen data and the classification result is shown in Table 1.6. Overall, we found out that the classification accuracy for Active/Reflective is 100%, Sensing/Intuitive 98.5%, Visual/Verbal 100% and Sequential/Global 96%. Based on the accuracy of the testing results, we conclude that BPNN is able to classify the student’s learning dimension with accurate result. Table 1.5 Neural Network Architecture for classifying FS dimension No. of Input Neurons FS Dimension No. of Hidden Neurons Learning Rate Momentum Error Rate Active/Reflective 6 4 0.2 0.7 0.005 Sensing/Intuitive 7 5 0.2 0.7 0.005 Visual/Verbal 4 3 0.2 0.7 0.005 Sequential/Global 3 2 0.2 0.7 0.005 Table 1.6 FS Dimension Classification Results Training Accuracy Active/Reflective 100% Sensing/Intuitive 100% Visual/Verbal 100% Sequential/Global 100% Testing Accuracy 100% 98.5% 100% 96.0% In the second experiment, we combined the twenty attributes belong to the four dimension of FS in order to classify the students based on the integrated FS. In this experiment we intend to classify students into sixteen learning style. We simulated 1190 data and divided the data into 4 groups. We ran BPNN 4 times on different testing and training samples. The BPNN architecture for Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 15 classifying students based on the integrated FS is listed below: Backpropagation Neural Network Architecture Input neurons - 20 Hidden neurons -12 Output Neuron - 4 Learning Rate - 0.002 Error Rate – 0.005 Figure 1.2 shows the topological structure of BPNN used for the classification. The architecture consists of 20 input neurons, 12 hidden neurons, and 4 output neurons. The trial and error approach has been used to find a suitable number of hidden layers that provides the best classification accuracy. The accuracy classifying results were analyzed and shown in Table 1.7. The result reveals that the average testing accuracy is satisfactory with 94.75% accuracy. We found out that the student’s category classified using integrated FS dimension mostly matches the result with individual FS dimension. Based on the results, it shows that the integrated FS dimension can be used to classify student learning styles accurately and much faster compare to the individual classification of the learning style dimension. 16 Advances in Artificial Intelligence Applications Figure 1.2 Table 1.7 Sample No. Sample 1 Sample 2 Sample 3 Sample 4 Average Neural Network Architecture Results for The Classification of IFS Training Results Accuracy 100% 98% 98% 100% 99% Testing Results Accuracy 96% 94% 96% 93% 94.75% Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 1.9 17 CONCLUSION AND FUTURE WORK This research had identified several issues related to learning styles which is an important feature of adaptation in hypermedia learning environment. Experiments done using BPNN showed that BPNN was able to classify the learning dimension of a student by examining the students’ interaction in the hypermedia learning system. The results showed that BPNN performed well in identification of the learning styles. For future, we will hybrid Neural Network and Rough Sets to improve the classification performance in term of the processing time. We also will analyze the real behavior and characteristics of the students which is traced from the web log in order to diagnose the student’s learning style. 1.10 REFERENCES Bajraktarevic, N., Hall and W. Fullick, P. (2003). Incorporating Learning Styles in Hypermedia Environment: Empirical Evaluation. Proceedings of AH2003: Workshop on Adaptive Hypermedia and Adaptive Web-based Systems, Budapest, Hungary, 41-52. Carver, C. A., Howard R. A. and Lavelle E. (1996). Enhancing Student Learning By Incorporating Learning Styles Into Adaptive Hypermedia. Proceedings of ED-MEDIA’96 World Conference on Educational Multimedia Hypermedia, pp. 118123. Felder, R. and Soloman, B. Index Of Learning Styles Questionnaire. Retrieved 6 February, 2006, from 18 Advances in Artificial Intelligence Applications http://www.engr.ncsu.edu/learningstyles/ ilsweb.html Felder, R. and Silverman, L. (1988). Learning And Teaching Styles In Engineering Education. Engineering Education, 78 (7), pp. 674–681. http://www.ncsu.edu/felderpublic/ Papers/LS1988.pdf. Graf, S. and Kinshuk. (2006). An Approach for Detecting Learning Styles in Learning Management Systems. Proceedings of the International Conference on Advanced Learning Technologies. IEEE Computer Science, pp. 161-163. Kelly, D. and Tangney, B. (2004). Predicting Learning Characteristics In A Multiple Intelligence Based Tutoring System. LNCS Volume 3220/2004. Springer Berlin/Heidelberg. Lo, J. and Shu P. (2005). Identification Of Learning Styles Online by Observing Learners’ Browsing Behaviour Through A Neural Network. British Journal of Educational Technology. 36(1) 43–55. Magoulas, G., Papanikolaou, K. and Grigoriadou, M. (2003). Adaptive Web-Based Learning: Accommodating Individual Differences Through System’s Adaptation. British Journal of Educational Technology. 34(4). Papanikolau, K., Grigoriadou, M., Knornilakis, H. and Magoulas, G. (2003). Personalizing thelepa Interaction in a Web-based Educational Hypermedia System: the case of INSPIRE – User Modeling and User-Adapted Interaction. (13), 213 – 267. Paridah, S., Nor Bahiah, A., Norazah, Y., Siti Zaiton M.H. and Siti Mariyam S. (2001). Neural Network Application on Knowledge Acquisition for Neural Network for Classifying Student Learning Learning Characteristics in E-Learning 19 Adaptive Hypermedia Learning. Chiang Mai Journal Science, 28(1);65-70. Stash, N. and de Bra P. (2004). Incorporating Cognitive Styles in AHA! (The Adaptive Hypermedia Architecture). Proceedings of The IASTED International Conference WEB-BASED EDUCATION, 2004, Austria. Villaverde, J., Godoy, D. AND Amandi, A. (2006). Learning Styles' Recognition In E-Learning Environments With Feed-Forward Neural Networks. Journal of Computer Assisted Learning, Vol. 22(3), pp. 197—206. Wolf, C. (2003). iWeaver: Towards an Interactive WebBased Adaptive Learning Environment to Address Individual Learning Styles. Australasian Computing Education Conference (ACE2003), Adelaide, Australia. Zywno, M. S. (2003). A Contribution to Validation of Score Meaning for Felder-Soloman’s Index of Learning Styles. Proc. of the 2003 American Society for Engineering Annual Conference and Exposition. online at http://www.ncsu.edu/felderpublic/ ILSdir/Zywno_Validation_Study.pdf, Retrieved August 2, 2006. 2 A HYBRID ARIMA AND NEURAL NETWORK FOR YIELDS PREDICTION Ruhaidah Samsudin Ani Shabri 2.1 INTRODUCTION The accuracy of time series forecasting is fundamental to many decisions processes and hence the research for improving the effectiveness of forecasting models has never been stopped (Zhang, 2003). Recent research activities in artificial neural network (ANN) have shown powerful pattern classification and pattern recognition capabilities. One major application area of ANN is forecasting (Sharda, 1994). ANN provides an attractive alternative tool for both forecasting researchers and has shown their nonlinear modeling capability in data time series forecasting. The ARIMA model is one of the most popular models in traditional time-series forecasting and is often used as a benchmark model to compare with other models. The popularity of the autoregressive integrated moving average (ARIMA) model is due to its statistical properties A Hybrid ARIMA and Neural Network for Yields Prediction 21 as well as the well-known Box-Jenkins methodology in the model building process. However, the ARIMA model is only a class of linear model and thus it can only capture linear feature of data time series. ARIMA models and ANN are often compared with mixed conclusions in terms of superiority in forecasting performance. Although the ANN model achieves success in many time series forecasting, they have some disadvantages. Survey of the literature shows that both ARIMA and ANN models have performed well in different cases (Zhang, 1999). Since the real world is highly complex, there exits some linear and nonlinear patterns in the time series simultaneously. It is not sufficient to use only a nonlinear model for time series because the nonlinear model might miss some linear features of time series data (Zou et al., 2007). In real life, time series often contain both linear and nonlinear patterns. If this is the case, there is no universal model that is suitable for all kinds of time series data. Both ARIMA models and ANN models have achieved success in their own linear or nonlinear domains, but neither ARIMA nor ANN can adequately model and predict time series since the linear models cannot deal with nonlinear relationships while the ANN model alone is not able to handle both linear and nonlinear patterns equally well (Zhang, 2003). ARIMA model is a class of linear models that can capture time series’ linear characteristics, while ANN models are a class of general function approximator capable of modeling non-linearity and which can capture nonlinear patterns in time series. Thus, the objective of this paper is to develop a hybrid model for rice yield forecasting. This model 22 Advances in Artificial Intelligence Applications combines a time series linear model (ARIMA) and nonlinear ANN model. This is because the ANN model and ARIMA model are complementary. 2.1 ARIMA MODEL George Box and Gwilym Jenkins (1976) developed ARIMA models to become popular at the beginning of the 1970s. The general ARIMA models are compound of a seasonal and non-seasonal part are represented by the following way: φ p ( B)Φ P ( B s )∇ d ∇ sD xt = θ q ( B)Θ Q ( B s )a t (1) where φ(B) and θ(B) are polynomials of order p and q, respectively; Φ ( B s ) and Θ( B s ) are polynomials in B s of degrees P and Q, respectively; p order of nonseasonal autoregression; d number of regular differencing; q order of the nonseasonal moving average; P order of seasonal autoregression; D number of seasonal differencing; Q order of seasonal moving average; and s length of season. For time series forecasting task, the prediction model has the general form xt = f ( xt −1 , xt − 2 , ..., xt − p ) + et (2) The Box-Jenkins methodology is basically divided in the four stages: identification, estimation, diagnostic checking and forecasting. The identification stage involved A Hybrid ARIMA and Neural Network for Yields Prediction 23 transforming the data if necessary to improve the normality and the stationary time series. The next step is choosing the suitable model by analyzing both the autocorrelation (ACF) and partial autocorrelation function (PACF) of the stationary series. Once a model is identified, the parameters of the model are estimate. It is necessary to check if the assumptions are satisfied. Diagnostic checking using the ACF and PACF of residuals was carried out, which can be referred to Brockwell and Davis (2002). The forecasting model was then used to compute the fitted values and forecasts values. 2.2 THE NEURAL NETWORK FORECASTING MODEL The ANN model is an information processing system that has certain performance characteristics in common with biological neural networks. Also, ANNs have been developed as generalizations of mathematical models of human cognition or neural biology. A typical neural network used in the present study is shown in Figure 2.1. This is called feed forward type of network. In general, it is composed of three layers: the input layer, the hidden layer and the output layer. Each layer has a certain number of processing elements called neurons. Signals are passed between neurons over connection links. Each connection link has an associated weight, which in a typical neural net, multiplies the signal transmitted. Each neuron applies a transfer function to its net input to determine its output signal. The input layer of network consists of n units ( x1 , x2 ,..., xn ) and one bias unit ( x0 ) . The hidden layer 24 Advances in Artificial Intelligence Applications consists of p units ( z1 , z 2 ,..., z p ) and one bias unit ( z 0 ) , while the output layer has one unit ( y ) , which is the value to be predicted. The bias units have the value “one” as input signals. W ij TF1 x1 Wj TF2 x2 . . . . . . TFj xi . . . y . . . xn TFm TF1 Z 1 = TF1 = 1 1 + e −t t = θ1 + ∑ xi wi1 n y Three-Layer 1 1 + e −t t = θ + ∑ z jwj m i =1 Figure 2.1 Network y = j =1 Back Propagation Neural To build a model for forecasting, the training of the ANN by back propagation process is carried out in the following steps. Step 1: Set all the weights and threshold levels of the network. Step 2: Calculate the actual outputs of the neurons in the hidden layer. A Hybrid ARIMA and Neural Network for Yields Prediction z j = f (θ j + ∑ xi wij ) 25 n i =1 (3) where n is the number of inputs of neuron j in the hidden layer θ is the threshold applied to the neuron, wij is the weight for the connection between input layer i and the hidden layer j, f(.) is the transfer function, typical sigmoid function given by f (t ) = 1 (4) 1 + e −t Step 3: Calculate the actual outputs of the neurons in the output layer: y1 = f (θ1 + ∑ z j w j ) m j =1 (5) where p is the number of hidden layer. Step 4: Update the weights in the back propagation network use error back propagation. The weight is updated using the following equation w(t + 1) = w(t ) + Δw(t ) (6) where t the number of iteration and Δw(t ) is the weight correction. Then, the output unit computes its activation to get the signal y1 . Step 5: Increase iteration t by one, go back to Step 2 and repeat the process until the selected error criterion is satisfied. 26 Advances in Artificial Intelligence Applications The relationship between the input observations ( y t −1 , y t − 2 , ..., y t − p ) and the output value ( y t ) has following: y t = a 0 + ∑ j =1 a j f ( w0 j + ∑i =1 wij y t −i ) + ε t q p (7) where a j (j = 0, 1, 2, …, q) is a bias on the jth unit, and wij (i = 0, 1, 2, …, p; j = 0, 1, 2, …, q) is the connection weights between layers of the model, f(•) is the transfer function of the hidden layer, p is the number of input nodes and q is the number of hidden nodes (Lai et al. 2006). Actually, the ANN model in (7) performs a nonlinear functional mapping from the past observation ( y t −1 , y t − 2 , ..., y t − p ) to the future value ( y t ) , i.e., y t = f ( y t −1 , y t − 2 , ..., y t − p , w) + ε t (8) where w is a vector of all parameters and f is a function determined by the network structure and connection weights. Thus, in some senses, the ANN model is equivalent to a nonlinear autoregressive (NAR) model. A major advantage of neural networks is their ability to provide flexible nonlinear mapping between inputs and outputs. They can capture the nonlinear characteristics of time series well. 2.1 THE HYBRID FORECASTING METHODOLOGY Zhang et al. (2003) proposed methodology of hybrid (Hzhang) model consists of a linear component and nonlinear component. These two components have to be A Hybrid ARIMA and Neural Network for Yields Prediction 27 estimated from the data. First, we let ARIMA model is used to analyze the linear part of the problem and then, the ANN model is used to model residuals from the ARIMA model. The results from the ANN can be used as predictions of the error term for the ARIMA model. The hybrid model can be defined as Yt = Lt + N t (et ) (9) where Lt is a linear component and N (et ) is residual of nonlinear component for time t from the linear model Lt . In this study another hybrid methodology was proposed to forecast rice yield series. In the first step, the ANN models were applied to forecast the series data. After fitting the ANN model, the residuals are obtained and the ARIMA model is used for fitting the residuals. The results from the ARIMA are used for prediction of the error for the ANN model. The new hybrid (HNEW) model may be written as Yt = N t + Lt (et ) (10) where N t is a nonlinear component and L(et ) is residual of nonlinear component for time t from the nonlinear model Nt . 2.2 EMPIRICAL RESULTS The data set used in this paper is the rice yields data from 1995 to 2003, giving a total of 432 observations. The data were collected from Muda Agricultural Development 28 Advances in Artificial Intelligence Applications Authority (MUDA) Kedah, Malaysia ranging from 1995 to 2003. There are 4 areas with 27 locations. The rice yield series data is used in this study to demonstrate the effectiveness of the hybrid method. The time series of the data is given in Figure 2.2. Figure 2.2 2.3 Rice yield data series (1995- 2003) ARIMA MODELS The plots in Figure 2.2 indicate that the time series of rice yields are non-stationary in the mean and the variance. The transformed time series using the natural logarithm was taken, and then differencing was applied. The sample ACF and PACF for the transformed series are plotted in Figure 2.3. The sample ACF of the transformed data revealed significant time lags at 1 and 27, while the PACF spikes at lag 1, 2 and 27. The sample ACF and PACF indicated that the rice yields have exhibited some pattern of seasonality in the series. A Hybrid ARIMA and Neural Network for Yields Prediction 29 The plots suggest that ARIMA model is appropriate. Several models were identified and the statistical results during training are compared in the following Table 2.1. The criterions to judge for the best model based on mean square error (MSE) show that the ARIMA (0,1,1)X(1,0,1) is a relatively best model. This model has both nonseasonal and seasonal components. Figure 2.3 ACF and PACF for differenced series of natural logarithms 2.4 NEURAL NETWORK MODELS In this investigation, we only consider the situation of onestep-ahead forecasting with 27 observations. Before the training process begins, data normalization is often performed. The linear transformation formula to [0, 1] is used: 30 Advances in Artificial Intelligence Applications xn = x0 − x min x max − x min (11) where x n and x0 represent the normalized and original data; and x min and x max represent the minimum and maximum values among the original data. In order to conform the neural network used in the forecast, ACF and PACF were used to determine the maximum number of input neurons used during the training (Cadenas and Rivera, 2007). The input nodes are 3, 9, 18 and 27. Table 2.1 results Comparisons of ARIMA models statistical ARIMA MODEL (4,1,0)x(1,0,0) (4,1,0)x(0,0,1) (0,1,1)x(1,0,1) (0,1,1)x(0,0,1) TRAIN MSE FORECASTING 0.00599 0.00358 0.00335 0.00348 0.00118 0.00124 0.00108 0.00120 The hidden layer plays very important roles for many successful applications of neural network. The hidden layer allow neural network to detect the feature, to capture the pattern in the data, and to perform complicated nonlinear mapping between input and output variables. The most common way in determining the number of hidden nodes is via experiments or by trial-and-error. In this study, A Hybrid ARIMA and Neural Network for Yields Prediction 31 for each input layer, the number of hidden nodes were determined using formula “ I / 2 ” (Kang, 1991), “I” (Tang and Fiswick, 1993), “2I” (Wong, 1991) and “2I +1” (Lippmann, 1987), where I corresponding input neurons. The sigmoid activation function was used as the transfer function at both hidden and output layers. The network was trained for 5000 epochs using the back-propagation algorithm with a learning rate of 0.001 and a momentum coefficient of 0.9. The results in terms of MSE statistic for all the ANN models are presented in Table 2.2. Analyzing the results during training, it can be observed that the structure of ANN (27,54,1) and ANN(27,55,1) give slightly better forecasts the others of ANN. Next, comparison was made between ARIMA model, neural network and hybrid model for different leads times. A subset ARIMA (0,1,1) × (1,0,1) 27 has been found to be the most parsimonious among all ARIMA models that are also found adequate judged by the residual analysis. A neural network architecture of (27, 55, 1) is used to model the nonlinear patterns. Finally, by combining the ARIMA (0,1,1) × (1,0,1) 27 and neural network architecture of (27, 55, 1) hybrid models were obtained. The ARIMA, neural network and hybrid models for rice yields are discussed here. Figure 2.4 shows the bar chart in compares the validation MSE of the four forecasting method. The comparison between actual and predicted values is given in Figure 2.5. 32 Advances in Artificial Intelligence Applications Table 2.2 Comparisons of various ANN models together with the performance statistics for the rice yields. MSE Forecasting 0.0038 0.0021 0.0025 0.0017 Input 4 Hidden 2 4 8 9 Training 0.0060 0.0051 0.0049 0.0049 10 5 10 20 21 0.0050 0.0049 0.0047 0.0046 0.0034 0.0022 0.0025 0.0018 18 9 18 36 37 0.0050 0.0048 0.0048 0.0046 0.0014 0.0013 0.0012 0.0020 27 13 27 54 55 0.0047 0.0052 0.0047 0.0047 0.0012 0.0010 0.0011 0.0011 Note: The data in boldface means the prediction performance of NN is better than those of others, respectively. From Figure 2.4, it can be seen that the percentage error values of ARIMA and HNEW has the same value and perform better than the Hzang and NN model in the training process. The results in term of MSE show there is increase in their value as the lead time increase. The HNEW model has a good performance for the first 15 lead times (first 15 forecasted periods). Then, the error increased 33 A Hybrid ARIMA and Neural Network for Yields Prediction gradually. The maximum percentage error value is 0.17. Both ANN and Hzhang have shown poor performances in training and forecasting process. Figure 2.4 The MSE in the training and forecasting process for different lead times It is observed that the best forecasting result in term of MSE, the performance of hybrid model was better than ARIMA and ANN models. This shows that the performance improves for hybrid models. 220000 DATA ARIMA 200000 Rice Yield Hzhang 180000 NN HNEW 160000 140000 120000 27 25 23 21 19 17 15 13 11 9 7 5 3 1 100000 Locations Figure 2.5 Comparisons of the prediction made with ARIMA, ANN, Hzhang, HNEW and data of rice yields 34 2.5 Advances in Artificial Intelligence Applications CONCLUSIONS This study compares the performance of ANN model, the statistical model (ARIMA) and the hybrid model in forecasting the rice yields of Malaysia. Results show the ANN model forecasts are considerably less accurate than the traditional ARIMA model which used as a benchmark. On the other hand, the hybrid model using the ARIMA model and the error of NN model is an effective way to improve the forecasting performance in error measures in term of MSE. The hybrid model takes advantage of the unique strength of ARIMA and ANN in linear and nonlinear modeling. For complex problems that have both linear and nonlinear correlation structures, the combination method can be an effective way to improve forecasting performance. The empirical results with the rice yields data set clearly suggest that the hybrid model perform better than other two models explored in forecasting the rice yields. 2.6 REFERENCES Box, G. E. P. and Jenkins, G. (1970). Time Series Analysis. Forecasting and Control, Holden-Day, San Francisco, CA. Brockwell, P. J. and Davis, R. A. (2002). Introduction to Time Series and Forecasting. Berlin: Springer. Cadenas, E. and Rivera, W. (2007). Wind Speed Forecasting in the South Coast of Oaxaca, Mexico. Renewable Energy. 32:2116-2128. Kang, S. (1991). An investigation of the Use of Feedforward Neural Network for Forecasting. Ph.D. Thesis, Kent State University. Lai, K. K., Yu, L., Wang, S. and Huang, W. (2006). Hybridizing Exponential Smoothing and Neural A Hybrid ARIMA and Neural Network for Yields Prediction 35 Network for Financial Time Series Predication. ICCS 2006, Part IV, LNCS 3994: 493-500. Lippmann, R. P. (1987). An Introduction to Computing with Neural Nets. IEEE ASSP Magazine, April, 4-22. Sharda, R. (1994). Neural Networks for the MS/OR analyst: An application bibliography. Interfaces. 24(2), 116-30. Tang, Z. and Fishwick, P. A. (1993). Feedforward Neural Nets as Models for Time Series Forecasting. ORSA Journal on Computing, 5(4):374-385. Wong, F. S. (1991). Time Series Forecasting Using Backpropagation Neural Network. Neurocomputing. 2:147-159. Zhang, G. P. (2003). Time Series Forecasting Using a Hybrid ARIMA and Neural Network Model. Neurocomputing. 50: 159-175. Zou, H. F., Xia, G. P., Yang, F. T. and Wang, H. Y. ( 2007). An Investigation and Comparison of Artificial Neural Network and Time Series Models for Chinese Food Grain Price Forecasting. Neurocomputing. 70: 2913-2923. 3 A PERFORMANCE STUDY OF ENHANCED BP ALGORITHMS ON AIRCRAFT IMAGE CLASSIFICATION Puteh Saad Nursafawati Mahshos Subariah Ibrahim Rusni Darius 3.1 INTRODUCTION BP is by far the most widely used algorithm to train MLPs for pattern recognition and other similar tasks. However it is stigmatized with the problems of low convergence, instability and overfitting. In addition, the optimal values of the learning rate, momentum, the number of hidden layers and its dimension are obtained through trial and error method. In this work, we evaluate eleven (11) enhanced BP algorithms in classifying aircraft images. The image is represented using a set of Zernike Moment Invariants. A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 3.2 37 BP ISSUES. Low convergence in BP algorithm means that it takes longer time for the network to learn (finding the suitable weights), hence solving classification problems of high-dimensional data is intractable hence the problem is classified as NP-complete (Looney, 1997). The low convergence rate and instability are attributed to the following reasons: a) The BP is a first-order approximation of the steepestdescent technique, which is not the best type of the steepest descent method for finding the minimum of the mean square error (Czap, 2001; Luh and Zhang,1999). b) Another peculiarity of the error surface that impacts the performance of the BP algorithm is the presence of local minima (i.e. isolated valleys) in addition to global minimum. Since BP is basically a hill-climbing technique, it runs the risk of being trapped in a local minimum, where every small change in synaptic weights increases the error function. But somewhere else in the weight space there exist another set of synaptic weights for which the error function is smaller than local minimum in which the network is stuck. Clearly, it is undesirable to have the learning process terminating at a local minimum, especially if it is located far above a global minimum (Ahmed et al., 2001; Ng and Leung, 2001). c) The BP algorithm depends on the gradient of the instantaneous error surface in the weight space. The algorithm is therefore stochastic in nature; that is, it has a tendency to zigzag its way about true direction to a minimum on the error surface. Indeed, BP learning is an application of a statistical method known as 38 Advances in Artificial Intelligence Applications stochastic approximation and there fundamental causes for this property: exists two i. The error surface is fairly flat along the weight dimension, which means that the derivative of the error surface with respect to that weight is small in magnitude. In such a situation, the adjustment applied to the weight is small and consequently many iterations of the algorithm may be required to produce a significant reduction in error performance of the network. Alternatively the error surface is highly curved along a weight dimension, in which case the derivative of the error surface with respect to that weight is large in magnitude. In this second situation, the adjustment applied to the weight is large, which may cause the algorithm to overshoot the minimum of the error surface (Wen et al., 2000). ii. The direction of the negative gradient vector may point away from the minimum of the error surface, hence the adjustments applied to the weights may induce the algorithm to move in the wrong direction. Consequently, the rate of convergence in BP training tends to be relatively slow (Sidani and Sidani, 1994). Overfitting occurs by which the error produced during training is small, however when new data is presented to the network the error is large. The phenomena indicate that the network has memorized the training samples but has not learned to generalize to new situation (Patterson, 1996). The phenomena are due to the types of error function used and the presence of too many neurons in the hidden layer. The improvements made to cater the issues are described next A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 3.3 39 BP IMPROVEMENTS The above issues created a fertile land for innovative research ideas to improve the standard BP, resulting in a plethora of publications being produced. Among improvements made on different aspects of the BP are, weight adjustment in the hidden layer, adaptive determination of the learning rate, improvement of the gradient descent method and determination of the momentum. The improvements are performed using heuristic techniques and numerical optimization techniques. 3.3.1 Heuristic Techniques Heuristic techniques are developed from an analysis of the performance of the standard steepest descent algorithm. One heuristic technique is the introduction of momentum (Netnevitsky, 2002). Other heuristic techniques are; variable learning rate BP and resilient BP (Rprop). The values of the learning rate can be preset or determined adaptively. Conventionally the rate is pre-set to appropriate values and it remains constant throughout training. The performance of the algorithm is sensitive to the proper setting of the learning rate. If the learning rate is set too high, the algorithm may oscillate and become unstable. If it is set too small, the algorithm will take longer time to converge. It is not practical to determine the optimal setting for the learning rate before training and in fact the optimal learning rate changes during the training process as the algorithm moves across the performance surface. Sharda and Patil (1992) suggested three learning 40 Advances in Artificial Intelligence Applications rates (0.1, 0.5, 0.9) and three momentum values (0.1, 0.5, 0.9) (Zhang et al., 1998). The performance of the standard BP can be improved by allowing the learning rate to change during the training process. In the variable learning rate Jacobs (1988) introduced two heuristics. First heuristic indicate that if the change of the sum of squared errors has the same algebraic sign for several consequent epochs, the learning rate parameter should be increased. Second heuristic stated that if the algebraic sign of the change of the sum of squared errors alternates for several consequent epochs, then the learning rate parameter should be decreased (Negnevitsky, 2002). An adaptive learning rate will attempt to ensure that the step size as large as possible while keeping the learning stable. The learning rate is made responsive to the complexity of the local error surface. An adaptive learning rate requires some changes in the standard BP algorithm. First, the network outputs and errors are computed from the initial learning rate parameter. If the sum of squared errors at the current epoch exceeds the previous values by more than a predefined ratio (typically 1.04), the learning rate parameter is decreased (typically by multiplying by 0.7) and new weights and bias are calculated. However, if the error is less than the previous one, the learning rate is increased (typically by multiplying by 1.05) (Negnevitsky, 2002). The sigmoid activation functions are characterized by the fact that their slope must approach zero as the input gets larger. This raises a problem when using the gradient descent for training the MLP based on sigmoid function, since the gradient can have a very small magnitude; and therefore, causes small changes in the weights and biases, even though the weights and biases are far from their A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 41 optimal values (Haykins, 1999). The resilient BP (Rprop) eliminates the harmful effects of the magnitude of the partial derivatives by using only the sign of the derivative to determine the weight update direction. The magnitude of the derivative has no effect on the weight update. The size of the weight change is determined by a separate update value. i. Whenever the derivative of the error function with respect to each weight has the same sign for two successive iterations, the updated value for the weight and bias is increased by a certain factor. ii. Whenever the derivative of the error function with respect to each weight changes sign from the previous iteration, the updated value of the weights and bias is decreased by a certain factor. iii. If the derivative is zero, then the update value remains the same. iv. If the weights are oscillating the weight change will be reduced. v. If the weight change continues to change in the same direction for several iterations, then the magnitude of the weight change will be increased. 3.3.1 Numerical Optimization Techniques The problem of finding suitable weights that enable to MLP to generalize is an optimization problem (Stegeman and Buenfield, 1999). The significant numerical optimization algorithms suggested to overcome the weaknesses of the standard BP are divided into two category; conjugate gradients algorithms and quasi-Newton Algorithms. 42 Advances in Artificial Intelligence Applications In the standard BP, the error function decreases most rapidly along the negative of the gradient however fastest convergence is not guaranteed. Hence in conjugate gradient algorithm a search for a step size that minimizes the error function is performed along conjugate directions. The step size is updated in every iterations. Versions of the conjugate gradient technique are Fletcher-Reeves Update, Polak-Ribiere Update and Powell-Beale Restarts. The various versions of conjugate gradient are distinguished by the manner in which the ratio of the norm squared of the current gradient to the norm squared of the previous gradient. The conjugate gradient algorithms are normally much faster than variable rate BP, however the results are problem dependent. They are often a good choice for networks with a large number of weights (Charalambous, 1992). The Newton’s method compute the Hessian Matrix (second derivative) for feedforward MLP but it is complex and expensive. However Quasi-Newton (or secant) method does not require the computation of Hessian matrix. An approximate Hessian matrix is computed as a function of the gradient. BFGS is an example of a successful quasiNewton method. BFGS is named after its founder Broyden Flecther Goldfarb and Shannon. Although the BFGS algorithm converges faster as compared to the conjugate gradient method, it requires more computation and more storage. The approximate Hessian matrix of dimension n x n must be stored, where n is equal to the number of weights and bias, hence its storage requirement is high (Ribert, 1999). One step Secant algorithm is proposed to overcome the weakness of the BFGS algorithm by not storing the complete Hessian matrix. It assumes that during each A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 43 iteration, the previous Hessian was the identity matrix, thus the new search direction can be calculated without computing a matrix inverse. The algorithm is considered to be a compromise between conjugate gradient and quasiNewton method. It requires less storage and computation per iteration than the BFGS algorithm but slightly more storage and computation per epoch than the conjugate gradient algorithm. Levenberg-Marquart algorithm however estimates the Hessian matrix from the first derivates computed through a standard BP. Jacobian matrix is utilized to store the first derivatives. The algorithm appears to be the fastest method for training the moderate-sized MLP (up to several hundred weights) however it suffers the disadvantage of high storage requirement, since it need to store the Jacobian matrix. The size of the Jacobian matrix is Q x n, where Q is the number of training sets and n is the number of weights and biases in the network. In order to avoid the high storage requirement, Reduced Memory LevenbergMarquart algorithm is suggested. The algorithm split the Jacobian matrix into two submatrices as follows: [ ] ⎡J ⎤ H = J T J = J1T J 2T ⎢ 1 ⎥ = J1T J1 + J 2T J 2 ⎣ J2 ⎦ The approximate Hessian can be calculated by summing a series of subterms. Once one subterm has been computed, the corresponding submatrix of the Jacobian can be eliminated. Hence the full Jacobian does not have to exist at one time (Marquardt, 1963; Hagan and Menhaj,1994). Tables 3.1 to 3.3 summarize the series of algorithms that are explored in this work. 44 Advances in Artificial Intelligence Applications Table 3.1 Enhanced BP Algorithms Acronym Algorithm BFG BFGS QuasiNewton Function This algorithm requires more computation in each iteration and more storage than the conjugate gradient methods, although it generally converges in less iteration. CGB Conjugate Gradient with Powell/Beale The traincgb routine has performance that is somewhat better than traincgp for some problems, although performance on any given problem is difficult to predict. CGF FletcherPowell Conjugate Gradient It updates weight and bias values according to the conjugate gradient backpropagation with Fletcher-Reeves updates. usually much faster than variable learning rate backpropagation, and are sometimes faster than trainrp, although the results will vary from one problem to another. CGP Polak-Ribiére Conjugate Gradient It updates weight and bias values according to the conjugate gradient backpropagation with PolakRibiere updates. A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification Table 3.2 Enhanced BP Algorithms (continue) Acronym Algorithm SCG Scaled Conjugate Gradient GD GDM GDX 45 Function It requires more iteration to converge than the other conjugate gradient algorithms, but the number of computations in each iteration is significantly reduced because no line search is performed. Basic gradient A network training function descent. that updates weight and bias values according to gradient descent with momentum. Gradient descent It is a batch algorithm for feedforward networks that with often provides faster momentum. convergence. Momentum allows a network to respond not only to the local gradient and allows the network to ignore small features in the error surface. Without momentum a network may get stuck in a shallow local minimum. Variable Learning Rate Back propagation The function traingdx combines adaptive learning rate with momentum Training and but can only be used in batch mode training. 46 Advances in Artificial Intelligence Applications Table 3.3 Enhanced BP Algorithms (continue) Acronym LM Algorithm LevenbergMarquardt Function It updates weight and bias values according to LevenbergMarquardt optimization. The storage requirements of trainlm are larger than the other algorithms tested. OSS One-Step Secant This algorithm requires less storage and computation per epoch than the BFGS algorithm. It requires slightly more storage and computation per epoch than the conjugate gradient algorithms. It can be considered a compromise between full quasi-Newton algorithms and conjugate gradient algorithms. RP Resilient Backpropagation It is generally much faster than the standard steepest descent algorithm. It also has the nice property that it requires only a modest increase in memory requirements. We do need to store the update values for each weight and bias, which is equivalent to storage of the gradient. 47 A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 3.4 FEATURE EXTRACTION USING ZERNIKE MOMENT INVARIANT Zernike Moment (ZM) is chosen since it is invariant to rotation and insensitive to noise. Another advantage of ZM is the ease of image reconstruction because of its orthogonality property (Teague, 1980). ZM is the projection of the image function onto orthogonal basis functions. ZM also has a useful rotation invariance property where the magnitude of ZM will not change for a rotated image. Another main property of ZM is the ease of image reconstruction because of its orthogonality property. The major drawback of ZM is it’s computational complexity (Mukundan, 1998). This is due to the recursive computation of Radial Polynomials. However in this study, we overcome the problem of computational complexity by adopting a non recursive computation of Radial Polynomials. The computation is based on the relationship between Geometric Moment Invariant and ZM in order to derive Zernike Invariant Moment. The ZM of order p with repetition q for a continuos image function f(x,y) that vanishes outside the unit circle is as shown in Equation. 1. Z pq = p +1 π ∫∫ x 2 + y =1 f (r ,θ )V * pq rdrdθ (1) To compute a ZM of a given image, the center of the image is taken as the origin and pixel coordinates are mapped to the range of unit circle, i.e. x2 + y2 = 1. The functions of Vpq (r, θ) denote Zernike Polynomials of order p with repetition q, and * denotes a complex conjugate where the Zernike 48 Advances in Artificial Intelligence Applications Polynomials are defined as functions of the polar coordinates r, θ . Equations relating Zernike and Geometric Moment up to third order are given below Thus |Zpq| , the magnitude of the ZM, can be taken as a rotation invariant feature of the underlying image function. Rotation invariants and their corresponding expressions in Geometric Moment are given below until the order of three: Zoo = (1/π) Moo |Z11|2 = (2/π)2 ( M10 2+ M012) Z20 = (3/π) [2M20 + M02 ) – M00] 2 2 2 (2) 2 |Z22| = (3/π) [(M20 - M02) + 4 M11 ) |Z31| 2= (12/π)2 [(M30 + M12)2 + (M03+ M21)2] |Z33| 2= (4/π)2 [(M30 – 3M12)2 + (M03 – 3M21)2] 3.5 IMPLEMENTATION, RESULTS AND DISCUSSION The coloured aircraft images that are downloaded from internet (www.airliners.net) are converted into gray-level format. Noise is then removed from the image and subsequently it is thresholded using Otsu thresholding algorithm. The aircraft image samples are grouped into 49 A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification three categories based on their types. Category 1 represents commercial aircraft. Category 2 represents cargo and finally Category 3 represents military. In order to evaluate the intra-class invariance and the robustness of the feature extraction techniques adopted, each aircraft image perturbed by scaling and rotation factors to generate 12 variations of the same image. An example image with different variations is depicted in Figures 3.1 and 3.2. Table 3.4 provides the description of each image. Basically, there are 4 different of scaling factors chosen namely 0.5, 0.75, 1.3 and 1.5. For rotation factors, angles of 5o, 15o, 45o and 90o was chosen. While four images for each aircraft are perturbed by both factors (0.5 with 5o, 0.75 with 15o, 1.3 with 45o and 1.5 with 90o). Each category of aircraft has 10 different type of model. Hence each category consists of 10 original images and 120 perturbed images thus making a total of 390 images. (i) (ii) (iii) (iv) (v) (vi) Figure 3.1 An Aircraft Image and its variations 50 Advances in Artificial Intelligence Applications (vii) (viii) (ix) (x) (xii) (xiii) (xiv) Figure 3.2 (continue An Aircraft Image and its variations Zernike Moment features are extracted from image samples using equation (2). Each of the images has a set of six (6) features, indicated by ϕ1…ϕ6. Due to the space constraint, the features are displayed in two tables. Table 3.5 recorded of set of features from ϕ1 to ϕ3. Table 3.6 lists down the set of features from ϕ4 to ϕ6.. It is observed that ZM orders 1 and 2 have null values and order 3 is significant. Thus only A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 51 ϕ 3 to ϕ 6 are utilized to be classified by enhanced BP algorithms. Table 3.4 Image (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) Type of variations of aircraft images Variation Original image the image is reduced to half of its original size the image is reduced to 0.7 of its original size the image is enlarged 1.3 its original size the image is enlarged 1.3 its original size the image is rotated to 5o the image is rotated to 15o the image is rotated to 45o the image is rotated to 90o the image is reduced to 0.5 and rotated to 5o the image is reduced to 0.75 and rotated to 15o the image is enlarged 1.3 and rotated to 45o the image is enlarged 1.5 and rotated to 90o A k-fold cross validation technique is chosen to validate the classification results. Here image samples are divided into k subset. Then, cross validation process repeat k times to make sure all the subsets were trained and tested. The number of correct classification is computed using equation (3). n is refers to the number data test. If testing vector is true, σ(x,y)t=1. However, if testing vector is wrong, then σ(x,y)t=0. The percentage of correct classification is given by (4). 52 PCC Advances in Artificial Intelligence Applications k NCC k = 100 = G ∑ n t =1 ∑ 4 1 k =1 NCC k σ ( x, y ) (3) (4) t Table 3.5 The ZM Feature Vector of Aircraft Images from ϕ1 to ϕ3 ZMI ϕ1 ϕ2 ϕ3 Original 0.00000 0.00000 0.496054 0.00000 0.00000 0.495574 15o 0.00000 0.00000 0.495431 45 o 0.00000 0.00000 0.497110 90 o 0.00000 0.00000 0.495590 0.5 0.00000 0.00000 0.495738 0.75 0.00000 0.00000 0.495770 1.3 0.00000 0.00000 0.495620 1.5 0.00000 0.00000 0.495642 15 x 0.75 0.00000 0.00000 0.496542 45o x 1.3 0.00000 0.00000 0.495091 0.00000 0.00000 0.495644 5 o o o 90 x 1.5 In order to set the architecture of BP Network, the first task is to find the suitable number of hidden nodes. The BP is trained several times and the results are recorded in Table 3.7. It is noticed that the desirable number of hidden nodes is 28, since it produces the smallest MSE error. 53 A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification Table 3.6 The ZM Feature Vector of Aircraft Image from ϕ4 to ϕ6 ZMI 4 5 6 Original 0.008847 0.020535 0.009627 0.008694 0.019817 0.008370 15o 0.008739 0.020038 0.009627 45 o 0.009031 0.020004 0.007078 90 o 0.008648 0.020367 0.006955 0.5 0.008825 0.020038 0.009411 0.75 0.008757 0.019979 0.009484 1.3 0.008701 0.020155 0.009549 1.5 0.008691 0.020191 0.009568 15 x 0.75 0.008945 0.019826 0.005711 45o x 1.3 0.008708 0.019019 0.006961 0.008691 0.020185 0.006881 5 o o o 90 x 1.5 Table 3.7 Number of Hidden Nodes Neuron Epoch MSE Time (second) 7 10000 0.0296929 372.22 82 14 60 0.0198156 6.11 83 21 62 0.0192405 8.97 79 28 33 0.0180998 8.42 80 35 24 0.0196875 9.47 78 Legend: NCC – no of correct classification NCC 54 Advances in Artificial Intelligence Applications An experiment is further conducted on all the enhanced BP algorithms studied to search for a suitable epoch that can produce the smallest MSE error. Table 3.8 records the findings for 30,000 epochs. Table 3.8 Algorithm BFG LM RP SCG CGB CGF CGP OSS GD GDM GDX MSE during BP Training at 30,000 epochs Time (second) 143.50 13.22 43.50 765.58 313.84 617.30 950.11 801.38 520.00 625.78 742.41 T Algorithm BFG, LM, RP and CGB SCG, CGF, CGP, OSS GD, GDM, GDX Epoch MSE 2958 58 4175 30000 12020 26787 30000 30000 30000 30000 30000 0.00998265 0.00969848 0.00999991 0.01253640 0.00999982 0.01397950 0.01631220 0.01059640 0.05695580 0.05435680 0.03685260 MSE MSE < 0.01 0.02<MSE<0.01 MSE <0.06 55 A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification It is observed that only four algorithms (BFG, LM, RP and CGB) have MSE less than 0.01. Another four algorithms (SCG, CGF, CGP, OSS) produce MSE below 0.02 and higher than 0.01. MSE for Algorithms (GD, GDM, GDX) is 0.06. In order to decrease the MSE error further, maximum epoch is increased to 50,000. The findings are tabulated in Table 3.9. Thus MSE value is fixed to 0.02 and epoch is 50,000. Finally the necessary parameters to train the BP Network for all algorithms are listed in Table 3.10. Table 3.9 Algorithm MSE to train BP at 50,000 epochs Time (Second) Epoch MSE BFG 67.72 1324 0.01999730 LM 7.70 33 0.01809980 RP 16.80 1497 0.01999640 SCG 120.86 5598 0.01999980 CGB 82.77 2876 0.01999930 CGF 209.25 8640 0.02000000 CGP 121.47 4985 0.01999980 OSS 181.27 8032 0.01999510 GD 737.64 50000 0.04936950 GDM 634.44 50000 0.04779880 GDX 665.75 50000 0.03241720 56 Advances in Artificial Intelligence Applications Table 3.10 Algorithm. Parameters to train the Enhanced BP Parameter lr mc Target Error Number of Hidden Nodes Epoch Value 0.2 0.9 0.02 28 50000 Based on the listed parameters in Table 3.10, BP Network is trained and tested using all the enhanced BP algorithms. The cross validation results are recorded in Tables 3.11 to 3.13 Table 3.11 Algo k Epoch Classification Results MSE Time NCC PCC (Sec) BFG LM 1 1324 0.0199973 67.72 85 77.95 2 925 0.0199992 60.5 81 3 762 0.0199998 66.2 71 4 619 0.0199512 50.31 67 1 33 0.0180998 7.7 2 26 0.0194518 8.11 78 3 26 0.0193606 9.55 72 4 25 0.0196315 7.97 67 80 76.15 Av Time (sec) 61.18 8.33 57 A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification Table 3.12 Classification Results (continue) Algo k Epoch RP 1 1497 0.0199964 Time NCC PCC Av Time (Sec) (sec) 16.8 79 80.77 25.77 2 2554 0.0199973 33.58 79 3 2132 0.0199987 37.88 80 4 810 0.0199935 14.8 77 1 5598 0.0199998 120.86 2 3786 0.0199993 90.81 82 3 4175 0.0199924 134.69 74 4 1265 0.0199955 37.66 68 1 2876 0.0199993 82.77 84 79.23 2 3724 0.0199920 134.75 81 3 4001 0.0199982 191.22 75 4 1494 0.0199930 60.34 69 1 8640 0.0200000 209.25 84 78.46 2 5244 0.0199995 167.13 83 3 4774 0.0199999 208.39 75 4 1620 0.0199962 68.97 64 1 4985 0.0199998 121.47 84 78.46 2 5285 0.0199986 166.72 79 3 4522 0.0199867 199.64 75 4 1536 0.0199929 62.56 68 SCG CGB CGF CGP MSE 81 78.21 96.01 117.27 163.44 137.60 58 Advances in Artificial Intelligence Applications Table 3.13 Classification Results (continue) Algo k Epoch OSS 1 8032 0.0199951 Time NCC PCC Av Time (Sec) (sec) 181.27 79 76.67 274.50 2 9948 0.0199848 268.61 78 3 11093 0.0199705 508.42 73 4 4311 0.0199353 139.7 69 1 50000 0.0493695 737.64 53 54.87 2 50000 0.0494339 758.11 55 3 50000 0.0500599 1010.42 56 4 50000 0.0449906 885.84 50 GDM 1 50000 0.0477988 634.44 53 53.33 2 50000 0.0522435 685.25 56 3 50000 0.0496276 1061.97 52 4 50000 0.0464307 872.53 47 1 50000 0.0324172 665.75 61 66.92 2 50000 0.0316615 733.31 72 3 50000 0.0324195 796.33 66 4 50000 0.0281532 710.66 62 GD GDX MSE 848.00 813.55 726.51 The results are analyzed from two aspects, percentage classification (see Figure 3.3) and time taken to reach the target output (see Figure 3.4). In terms of percentage classification RP wins by having the highest rate of A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 59 classification but it loses to LM in terms of computation rate. The finding is in accordance to the theory that claim LM converges faster since it estimates the Hessian matrix from the first derivatives without having to store the Jacobian matrix. The algorithm split the Jacobian matrix into two sub-matrices. Percentage Classification of Enhanced BP Figure 3.3 Algorithms Time Vs Algorithm Time(sec) 1000 800 600 400 200 G D G D M G D X SC G CG B CG F CG P O SS RP LM BF G 0 Algorithm Figure 3.4 Computation Time against Algorithm 60 Advances in Artificial Intelligence Applications The approximate Hessian can be calculated by summing a series of sub-terms. Once one sub-term has been computed, the corresponding sub-matrix of the Jacobian can be eliminated. Another observation is on the performance of BFG, SCG, CGB, CGF, CGP and OSS, the percentage classification is less by five percent to RP. However algorithm GD, GDM and GDX display a poor performance due to the heuristic nature of gradient descent. Hence it is found that among all the enhanced BP algorithms examined, RP (Resilient Backpropagation) achieves the highest classification rate. LM wins in terms of computation time. 3.6 REFERENCE Ahmed, W. A. M., Saad, E. S. M. and Aziz, E. S. A. (2001). Modified back propagation algorithm for learning artificial neural networks. Proceedings of the Eighteenth National Radio Science Conference NRSC 2001. 1: 345 – 352. Airc. Aviation Photos. (accesses 10 Ogos 2007). www.airliners.net. Charalambous, C. (1992). Conjugate gradient algorithm for efficient training of artificial neural networks. Proceedings of the IEEE. 139 (3), 301-310. Czap, H. (2001). Construction and interpretation of multilayer-perceptrons. IEEE International. Conference on Systems, Man, and Cybernetics. 5, 3349 - 3354. Hagan, M. T. and Menhaj, M. B. (1994). Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks. 5(6). 989 – 993. Haykin, S. (1994). Neural Networks – A Comprehensive Foundation. Toronto, Canada: Maxwell Macmillan. A Performance Study of Enhanced BP Algorithms on Aircraft Image Classification 61 Looney, C. G. (1997). Pattern Recognition Using Neural Networks: Theory and Algorithms for Engineers and Scientists. New York Oxford : Oxford University Press. Luh, P. B. and Li Zhang. (1999). A novel neural learning algorithm for multilayer perceptrons. International Joint Conference on Neural Networks IJCNN '99. 3, 1696 – 1701. Marquardt, D. (1963). An algorithm for least squares estimation of non-linear parameters. Journal Society Industrial Applied Mathematics: 431-441. Mukundan, R., and Ramakrishnan, K. R. (1998). Moment Function In Image Analysis. Farrer Road, Singapore: World Scientific Publishing. Negnevitsky, M. (2005). Artificial Intelligence A Guide to Intelligent Systems. 2nd ed. Harlow: AddisonWesley. Ng, S. C. and Leung, S. H. (2001). On solving the local minima problem of adaptive learning by using deterministic weight evolution algorithm. Proceedings of the 2001 Congress on Evolutionary Computation. 1, 251 - 255 Patterson, D. W. (1996). Artificial Neural Network Theory and Application. Singapore : Prentice-Hall. Ribert, A., Stocker, E., Lecourtier, Y. and Ennaji, A. (1999). A survey on supervised learning by evolving multilayer perceptrons. Proceedings of Third International Conference on Computational Intelligence and Multimedia Applications, ICCIMA '99, 122 – 126. Sidani, A. and Sidani, T. (1994). A comprehensive study of the backpropagation algorithm and modifications. Southcon/94. Conference Record, 80 – 84. 62 Advances in Artificial Intelligence Applications Teague, M. R. (1980). Image Analysis Via The General Theory Of Moments. Journal of the Optical Society of America. 70 (8) : 920-930. Wen, J., Zhao, J. L., Luo, S. W. and Han, Z. (2000). The improvements of BP neural network learning algorithm. 5th International Conference on Signal Processing Proceedings WCCC-ICSP 2000. 3, 1647 – 1649. Zhang, G., Patuwo, B. D. and Hu, M. Y. (1998). Forecasting with artificial neural networks : The state of the art. International Journal of Forecasting. 14: 35 - 62 4 ANFIS FOR RICE YIELDS FORECASTING Ruhaidah Samsudin Puteh Saad Ani Shabri 4.1 INTRODUCTION Almost 90% of rice is produced and consumed in Asia, and 96% in developing countries. In Malaysia, the Third Agriculture Policy (1998-2010) was established to meet at least 70% of Malaysia’s demand a 5% increase over the targeted 65%. The remaining 30% comes from imported rice mainly from Thailand, Vietnam and China (Saad et al., 2006). Raising level of national rice self-sufficiency has become a strategic issue in the agricultural ministry of Malaysia. The ability to forecast the future enables the farm managers to take the most appropriate decision in anticipation of that future. The accuracy of time series forecasting is fundamental to many decisions processes (Zou et al., 2007). One of the most important and widely used time series model is artificial neural network (ANN). ANN is being used more frequently in the analysis of time series forecasting, pattern classification and pattern recognition 64 Advances in Artificial Intelligence Applications capabilities (Ho et al., 2007). ANN provides an attractive alternative tool for both forecasting researchers and has shown their nonlinear modeling capability in data time series forecasting. Another approach is using fuzzy logic. Fuzzy Logic first developed to explain the human thinking and decision system by Zadeh (Sen et al., 2006). Several studies have been carried out using fuzzy logic in hydrology and water resources planning (Chang et al., 2001; Liong et al., 2000; Mahabir et al., 2000; Ozelkan and Duckstein, 2001; Sen et al., 2006). Recently, an adaptive neuro-fuzzy inference system (ANFIS), which consists of the ANN and fuzzy logic methods, has been used for several application, such as database management, system design and planning/ forecasting of water resources (Chang et al., 2006; Chang et al., 2001; Chen et al., 2006; Firat and G¨ung¨or , 2007; Firat, 2007; Nayak et al., 2008). The main purpose of this study is to investigate the applicability and capability of the ANFIS and ANN for modeling of rice yields time-series forecasting. To verify the application of this approach, the rice yields data from 27 stations in Peninsular Malaysia is chosen as the case study. 4.2 ARTIFICIAL NEURAL NETWORK (ANN) Recently, ANN has been extensively studied and used in time series forecasting. Zhang presented a recent review in this area (Zhang et al., 2003). The major advantage of 65 ANFIS for Rice Yields Forecasting ANN is their ability to model complex nonlinear relationship without a priori assumptions of the nature of the relationship. The ANN with single hidden layer feedforward network is the most widely used model for modeling and forecasting. The model is characterized by a network of three layers of simple processing units connected by a cycle links. The relationship between the input observations ( y t −1 , y t − 2 , ..., y t − p ) and the output value ( y t ) has following: y t = a 0 + ∑ j =1 a j f ( w0 j + ∑i =1 wij y t −i ) + ε t q p (1) where a j (j = 0, 1, 2, …, q) is a bias on the jth unit, and wij (i = 0, 1, 2, …, p; j = 0, 1, 2, …, q) is the connection weights between layers of the model, f(•) is the transfer function of the hidden layer, p is the number of input nodes and q is the number of hidden nodes. Training a network is an essential factor for the success of the neural networks. Among the several learning algorithms available, back-propagation has been the most popular and most widely implemented learning algorithm for all neural network paradigms (Zou et al., 2007). In this paper algorithm of back-propagation is used in the following experiment. Actually, the ANN model in (3) performs a nonlinear functional mapping from the past observation ( y t −1 , y t − 2 , ..., y t − p ) to the future value ( y t ) , i.e., y t = f ( y t −1 , y t − 2 , ..., y t − p , w) + ε t (2) 66 Advances in Artificial Intelligence Applications where w is a vector of all parameters and f is a function determined by the network structure and connection weights. Thus, in some senses, the ANN model is equivalent to a nonlinear autoregressive (NAR) model. A major advantage of neural networks is their ability to provide flexible nonlinear mapping between inputs and outputs. They can capture the nonlinear characteristics of time series well. 4.3 THE ADAPTIVE NEURAL FUZZY INFERENCE SYSTEM (ANFIS) The ANFIS is a multilayer feed-forward network consisting of the nodes and directional links, which combines the learning capabilities of a neural network and reasoning capabilities of fuzzy logic. Since Zadeh proposed the fuzzy logic approach to describe complicated systems, it has become popular and has been used successfully in various engineering problems (Liong et al., 2000; Chang et al., 2001; Sen and Altunkaynak,2006 ; Nayak et al.,2004; Chang and Chang, 2006; Chen et al., 2006; Firat and Gungor, 2007). ANFIS has been used by many researchers to organize the network structure itself and to adapt the parameters of the fuzzy system for many engineering problems, such as the modeling of agricultural time-series. The fuzzy inference system is a rule-based system consisting of three conceptual components. These are: (1) a rule base, containing fuzzy if-then rules, (2) a database, defining the membership function and (3) an inference system, combining the fuzzy rules and producing the system results (Sen and Altunkaynak, 2006). The first phase of fuzzy logic modeling is determination of the membership functions of the input-output variables, the ANFIS for Rice Yields Forecasting 67 second phase is the construction of fuzzy rules, and the last phase is the determination of the output characteristics, output membership function and system results (Firat and Gungor , 2007; Murat, 2006). Figure 4.1 (a) Sugeno’s fuzzy if then rule and fuzzy reasoning mechanism; (b) equivalent ANFIS architecture. Two methods, called the back-propagation algorithm and the hybrid-learning algorithm, provide the learning of the ANFIS and construction of the rules and are used to determine the membership function of the input-output variables. A general structure of fuzzy system is demonstrated in Figure 4.1. ANFIS has been shown to be powerful in modeling numerous processes, such as rainfall-runoff modeling and real-time reservoir operation (Chang and Chang, 2006; Furat and Gungor, 2007; Chen et al., 2006). ANFIS uses the learning ability of the ANN to define the input-output 68 Advances in Artificial Intelligence Applications relationship and construct the fuzzy rules by determining the input structure. The system results were obtained by the thinking and reasoning capability of the fuzzy logic. The hybrid-learning algorithm and subtractive function are used to determine the input structure. The detailed algorithm and mathematical background of the hybrid-learning algorithm can be found in (Jang et al., 1997). There are two types of fuzzy inference system in the literature: the Sugeno-Takagi inference system and the Mamdani inference system. In this study, the Sugeno-Takagi inference system is used for modeling of agricultural time-series. The most important difference between these systems is the definition of the consequence parameter. The consequence parameter in the Sugeno inference system is a linear equation, called a ‘firstorder Sugeno inference system’, or a constant coefficient, called a ‘zero-order Sugeno inference system’. It is assumed that the fuzzy inference system includes two inputs, x and y, and one output, z. For the first-order Sugeno inference system, typical two rules can be expressed as Rule 1: IF x is A 1 and y is B 1 THEN f 1 = p 1 x + q 1 y + r 1 Rule 2: IF x is A 2 and y is B 2 THEN f 2 = p 2 x + q 2 y + r 2 Where x and y are the crisp inputs to the node i, A i and B i are are the linguistic labels such as low, medium, high, etc., which are characterized by convenient membership functions, pi , qi , and ri are the consequence parameters. The structure of this fuzzy inference system is shown in Figure 4.1. 69 ANFIS for Rice Yields Forecasting It consists of five layers as described below. Input nodes (layer 1). Each node in this layer generates membership grades of the crisp inputs which belong to each of convenient fuzzy sets by using the membership functions. Each nodes’s output O 1i is calculated by Oi1 = μ Ai ( x) for i = 1,2; Oi1 = μ Bi−2 ( y ) for i = 3,4 (3) where μ Ai and μ Bi−2 are the membership functions for the A i and B i fuzzy sets respectively. Various membership functions, such as trapezoidal, triangular, Gaussian generalized bell membership function, etc., can be applied to determine the membership grades. If generalized bell membership function is used, the membership function is given by Oi1 = μ Ai ( x) = 1 ⎡ ( x − ci ) 2 ⎤ 1+ ⎢ ⎥ 2 ⎢⎣ ai ⎥⎦ where ai , bi and ci are the parameter set. bi (4) Rule nodes (layer 2). In this layer, the AND/OR operator is applied to get one output that represents the results of the antecedent for a fuzzy rule, i.e. firing strength. The outputs of the second layer, called firing strengths Oi1 , are the products of the corresponding degrees obtained from layer 1, termed w, as follows: Oi2 = wi = μ Ai ( x) μ Bi ( y ) i =1,2 (5) 70 Advances in Artificial Intelligence Applications Average nodes (layer 3). The main target is to compute the ratio of firing strength of each ith rule to the sum of the firing strengths of all rules. The firing strength in this layer is normalized as Oi3 = wi = wi ∑ wi i =1,2 (6) i Consequent nodes (layer 4). The contribution of the ith rule towards the total output or the model output and/or the function defined is calculated by Oi4 = wi f i = wi ( pi x + qi y + ri ) i = 1, 2 (7) where wi is the ith node output from the previous layer as demonstrated in the third layer. {pi , qi , ri } is the parameter set in the consequence function and also the coefficients of linear combination in the Sugeno inference system. Output nodes (layer 5). This layer is termed the output node, in which a single node computes the overall output by summing al incoming signals and it is the last step of the ANFIS. The output of the system is calculated as f ( x, y ) = = w1 ( x, y ) f 1 ( x, y ) + w2 ( x, y ) f 2 ( x, y ) w1 ( x, y ) + w2 ( x, y ) w1 f 1 + w2 f 2 w1 + w2 Oi5 = f ( x, y ) = ∑ w. f i i (8) 71 ANFIS for Rice Yields Forecasting = wi f1 + wi f 2 ∑w f = ∑w i i i (9) i i Similar to ANNs, an ANFIS network can be trained based on supervised learning to reach from a particular input to a specific target output. ANFIS applies the hybrid-learning algorithm to achieve this aim, which consists of a combination of the ‘gradient descent’ and the ‘leastsquares’ methods. The gradient descent method is used to assign the non-linear input parameters, and the leastsquares method is employed to identify the linear output parameters ( pi , qi , ri ) . The antecedent parameter, i.e. MF given in layer 2, is applied to construct the rules of the ANFIS model. Since the input variables within a range might be clustered into several classes, the structure of the input layer needs to be determined accurately. The ‘subtractive fuzzy clustering’ function, which offers an effective result using less rules, is applied to solve the problem in the ANFIS modeling (Nayak et al., 2004). 4.4 ANN AND ANFIS The ANN and ANFIS models for rice yield forecasting were developed using the data acquired from Muda Agricultural Development Authority (MUDA) Kedah, Malaysia ranging from 1995 to 2001. There are 4 areas with 27 locations. There are two types of season symptom that influenced the rice yield in Malaysia. The rice yield series data is used in this study to demonstrate the effectiveness of the hybrid method. These 72 Advances in Artificial Intelligence Applications time series come from different location and have different statistical characteristics. The rice yields data contains the yields data from 1995 to 2001, giving a total of 351 observations. Given a set of 351 observations made at uniformly spaced time intervals, the locations of rice yield are rescaled to the time axis becomes the set of integers {1, 2, …, 432}. For example the first location in 1995 is written as time 1, the second location in 1995 as time 2 and so on. The time series plot is given in Figure 4.2. Figure 4.2 Rice Yields Series (1995-2001) In order to assess the forecasting performance of different models, each data set is divided into two samples. The first series was used for training the network (modeling the time series) and the remaining were used for testing the performance of the trained network (forecasting). We extract the data from 1995 to 2001 producing 351 observations for training and the remainder as the output sample data set with 27 observations for forecasting purpose. The performance of the each model for both the training data and forecasting data are evaluated and is selected according to the mean absolute error (MAE) and root-mean-square error (RMSE), which are widely used for ANFIS for Rice Yields Forecasting 73 evaluating results of time series forecasting. The MAE and RMSE are defined as MAE = 1 N RMSE = ∑ N t =1 1 N y t − ŷ t yt ∑ (y N t =1 t − ŷ t ) (10) 2 (11) where y t and ŷ t are the observed and the forecasted rice yields at the time t. The criterions to judge for the best model are relatively small of MAE and RMSE. 4.5 IMPLEMENTATION OF ANN In this investigation, we only consider the situation of onestep-ahead forecasting with 27 observations. Before the training process begins, data normalization is often performed. The linear transformation formula to [0, 1] is used: xn = y0 y max where x n and y 0 represent the normalized and original data; and y max represent the maximum values among the original data. In order to confirm the neural network used in the forecast, ACF and PACF were used to determine the maximum number of input neurons used during the training. Figure 4.3 presents the ACF and PACF of data sets for the 74 Advances in Artificial Intelligence Applications rice yields time series. The input variable selection for the ANN is selected from lags with high ACF and PACF. Based on these analyses, the maximum number of lags, 27, was identified suitable to use as inputs for the proposed ANN. The one only neuron in the output layer represented being modeled. All the data were normalized in the range 0 and 1. After the input and output variables were selected, the ANN architecture of 27-H-1 was explored for capturing the complex, non-linear and seasonality of rice yields data. The network was trained for 5000 epochs using the backpropagation algorithm with a learning rate of 0.001 and a momentum coefficient of 0.9. Table 4.1 shows the performance of ANN during training with varying the number of neurons in the hidden layer (H). Table 4.1 Performance Variation of a Three-Layer ANN during training with the number of neurons in the hidden layer for ANN Criterion Number of neurons in the hidden layer RMSE MAE 3 9 15 21 27 33 39 45 51 57 63 70 14093 15733 15043 14263 14107 13794 13768 12863 13199 13334 12590 12791 0.118 0.129 0.115 0.114 0.114 0.113 0.106 0.099 0.102 0.103 0.097 0.101 ANFIS for Rice Yields Forecasting 75 It is observed that the performance of ANN is improved as the number of hidden neurons increases. However, too many neurons in the hidden layer may cause over-fitting problem, which results in the network can learn and memorize the data very well, but lacks the ability to generalize. If the number of neurons in hidden layer is not enough then the network may not be able to learn. So, an ANN with 63 neurons in the hidden layer seems to be appropriate. 4.6 IMPLEMENTATION OF ANFIS The ANFIS configuration is obtained through a trial and error process. One of the most important steps in developing a satisfactory forecasting ANFIS model is the selection of the input variables among the available variables. In this study firstly, the four models having various inputs are trained and tested by ANFIS method and the performances of models for rice yields are compared and evaluated based on training and forecasting performances. The structures of the forecasting models can be expressed as y (t ) 351 = f ( y (t − 1) 351 , y (t − 2) 351 ,..., y (t − k ) 351 ) where y t represents the rice yields at time t. (12) It worked with different numbers of membership functions: two, three, four, and five. Various membership functions, such as trapezoidal, sigmoid, Gaussian, generalized bell membership function were also considered. For output set before defuzzification process we select Linear Models, which indicates that the generated model are Type-I Takagi-Sugeno model. 76 Advances in Artificial Intelligence Applications The results in terms of various performance statistics from all the ANFIS models are presented in Tables 4.2 and 4.3. From experiment results, using ANFIS algorithm, we find the most of ANFIS suffer the problem of slow convergence and almost all of them cannot reach the target of training especially when the number of membership more and the number of input more than 4 are used. The final model was chosen according to the smallest value of errors. The analysis revealed that three symmetric Gaussian membership functions for four inputs have shown the lowest value of the RMSE and MAE. The network that performs best is chosen as the final model for forecasting of 27 observation rice yields. Table 4.2 Different Structure of the ANFIS Number of Inputs Membership Function Generalized Bell Membership Number Performance Criterion 2 3 4 MAE 0.136 0.122 0.111 RMSE 16321 16215 13839 MAE 0.132 0.118 0.286 RMSE 16066 14863 42073 MAE 16066 0.111 7.590 RMSE 1.5891 14673 8737933 MAE 0.123 0.231 - RMSE 16031 34048 - 2 3 4 5 77 ANFIS for Rice Yields Forecasting Table 4.3 Different Structure of the ANFIS (continue) Number of Inputs Membership Function Symmetric Gaussian Membership Number Performance Criterion 2 3 4 MAE 0.129 0.120 0.111 RMSE 15988 15726 13782 MAE 0.128 0.136 0.101 RMSE 15954 16628 12945 MAE 0.126 0.109 310339 RMSE 15821 14252 487690670 MAE 0.123 Inf - RMSE 15953 15851 - MAE 0.136 0.117 0.119 RMSE 16231 14626 15684 MAE 0.131 9.576 4703.250 RMSE 16004 13333672 9709706600 MAE 0.129 0.339 1.000 RMSE 16025 86185 101693 MAE 0.133 12312 1.000 RMSE 16910 6795058 101693 2 3 4 5 Sigmoid 2 3 4 5 Trapezoidal 2 3 4 5 MAE 0.129 0.127 0.122 RMSE 16035 16965 15062 MAE 0.142 0.126 0.475 RMSE 16643 16700 74466 MAE 0.131 0.437 11.978 RMSE 16023 106423 19406510 MAE 0.141 2.346 89796.671 RMSE 27228 1980899 72302242000 78 4.7 Advances in Artificial Intelligence Applications PERFORMANCE COMPARISON The values predicted by the adaptive neural network with fuzzy inference system (ANFIS) are compared with the answer. The forecasting accuracy was evaluated by undertaking the comparison with the ANN model. The best results of ANFIS during the training are compared with the ANN model in order to get the best forecasting model of rice yields. The performances of the best models developed by ANN and ANFIS models for forecasting data sets are summarized in Table 4.4. As it can be seen from Table 4.4, ANFIS model produces more minimum error than ANN model in term of RMSE and MAE measurement. Table 4.4 Rice yields forecast results Performance ANN ANFIS criterion MAE (%) 0.1103 0.101 RMSE 16782.406 12945 Figure 4.5 shows overall summary statistics forecasting for rice yields with three models by using box-plot. Figure 4.5 demonstrate that the ANFIS model performance is in general, accurate and satisfactory, where all data points of errors are quite near the zero. 4.8 CONCLUSION The ANN and ANFIS models were developed to predict the rice yield production using real data acquired from Muda Agricultural Development Authority (MUDA), Kedah. The prediction performances of both models are ANFIS for Rice Yields Forecasting 79 measured in terms of mean absolute error (MAE) and rootmean-square error (RMSE). Figure 4.3 Comparison of the ANN and ANFIS models Based on the performance of these models, it can be concluded that ANFIS is an effective method to forecast rice yields. The result suggests that the ANFIS method is superior to the ANN method in the modeling of time-series. This is because ANFIS combines the learning capabilities of a neural network and reasoning capabilities of fuzzy logic, thus having an extended prediction capability compared to single ANN and fuzzy logic techniques. The results of the ANFIS model show that the ANFIS can be applied successfully to develop time-series forecasting models, which could provide accurate forecasting and modeling of time-series data. 80 4.9 Advances in Artificial Intelligence Applications REFERENCE Chang, F. J, and Chang Y. T. (2006). Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Advances in Water Resources 29:1–10. Chang, F. J., Hu, H. F. and Chen, Y. C. (2001). Counter propagation fuzzy–neural network for river flow reconstruction. Hydrological Processes 15:219–232. The strategy of building a flood forecast model by neurofuzzy network. Hydrological Processes 20: 1525– 1540. Firat, M. and Gungor, M. (2007). River Flow Estimation using Adaptive Neuro-Fuzzy inference System. Mathematics and Computers in Simulation 75(3–4): 87–96. Firat, M. (2007). Watershed modeling by adaptive neurofuzzy inference system approach. PhD thesis, Pamukkale University, Turkey (in Turkish). Ho, S. L., Xie, M. and Goh, T. N. (2002). A comparative study of neural network and Box–Jenkins ARIMA modeling in time series prediction. Computers and Industrial Engineering 42: 371–375. Jang, J. S. R., Sun, C. T. and Mizutani, E. (1997). NeuroFuzzy and Soft Computing: Computational Approach to Learning and Machine Intelligence. Prentice Hall: Upper Saddle Chen SH, Lin YH, Chang LC, Chang FJ. 2006. River, NJ. Liong, S. Y., Lim W. H., Kojiri, T. and Hori, T. (2000). Advance flood forecasting for flood stricken Bangladesh with a fuzzy reasoning method. Hydrological Processes 14: 431–448. Mahabir, C., Hicks, F. E. and Fayek, A. R. (2000). Application of fuzzy logic to the seasonal runoff, Hydrological Processes 17: 3749–3762. ANFIS for Rice Yields Forecasting 81 Murat, Y. S. (2006). Comparison of fuzzy logic and artificial neural networks approaches in vehicle delay modeling. Transportation Research, Part C: Emerging Technologies 14: 316–334. Nayak, P. C., Sudheer, K. P., Ragan, D. M. and Ramasastri, K. S., b. (2004). A neuro fuzzy computing technique for modeling hydrological time series. Journal of Hydrology 29: 52–66. Ozelkan, E. C. and Duckstein, L. (2001). Fuzzy conceptual rainfall-runoff models. Journal of Hydrology 253: 41–68. Saad, P., Bakri, A., Kamarudin, S. S. and Jaafar, M. N. (2006). Intelligent Decision Support System for Rice Yield Prediction in Precision Farming. IRPA Report. Sen, Z. and Altunkaynak, A. (2006). A comparative fuzzy logic approach to runoff coefficient and runoff estimation. Hydrological Processes 20:1993–2009. Zhang, G. P. (2003). Time Series Forecasting Using a Hybrid ARIMA and Neural Network Model. Neurocomputing 50: 159-175. Zou, H. F., Xia, G. P., Yang, F. T. and Wang, H. Y. (2007). An Investigation and Comparison of Artificial Neural Network and Time Series Models for Chinese Food Grain Price Forecasting. Neurocomputing, 70: 2913-2923. 5 HYBRIDIZATION OF SOM AND GENETIC ALGORITHM TO DETECT OF UNCERTAINTY IN CLUSTER ANALYSIS E. Mohebi Mohd. Noor Md. Sap 5.1 INTRODUCTION The self organizing map (SOM) proposed by Kohonen (1982), has been widely used in industrial applications such as pattern recognition, biological modeling, data compression, signal processing and data mining (Kohonen, 1997; Mohebi and Sap, 2008a, 2008b; Sap and Mohebi, 2008a). It is an unsupervised and nonparametric neural network approach. The success of the SOM algorithm lies in its simplicity that makes it easy to understand, simulate and be used in many applications. The basic SOM consists of neurons usually arranged in a two-dimensional structure such that there are neighborhood relations among the neurons. After completion of training, each neuron is attached to a feature vector of the same dimension as input space. By assigning each input vector to the neuron with nearest feature vectors, the SOM is able to divide the input space into regions (clusters) with common nearest feature Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 83 vectors. This process can be considered as performing vector quantization (VQ) (Gray, 1984). Also, because of the neighborhood relation contributed by the interconnections among neurons, the SOM exhibits another important property of topology preservation. Clustering algorithms attempt to organize unlabeled input vectors into clusters such that points within the cluster are more similar to each other than vectors belonging to different clusters (Pal et al., 1993). The clustering methods are of five types: hierarchical clustering, partitioning clustering, density-based clustering, grid-based clustering and model-based clustering (Han and Kamber, 2000). The rough set theory employs two upper and lower thresholds in the clustering process which result in a rough clusters appearance. This technique also could be defined in incremental order i.e. the number of clusters is not predefined by users. In this paper, a new two-level clustering algorithm is proposed. The idea is that the first level is to train the data by the SOM neural network and the clustering at the second level is a rough set based incremental clustering approach (Ashraf et al., 2006), which will be applied on the output of SOM and requires only a single neurons scan. The optimal number of clusters can be found by rough set theory which groups the given neurons into a set of overlapping clusters (clusters the mapped data respectively). Then the overlapped neurons will be assigned to the true clusters they belong to, by apply genetic algorithm. A genetic algorithm has been adopted to minimize the uncertainty that comes from some clustering operations. In our previous work (Sap and Mohebi, 2008a) the hybrid SOM and rough set has been applied to catch the involved ambiguity of clusters but the experiment results show that 84 Advances in Artificial Intelligence Applications the proposed algorithm (Genetic Rough SOM) outperforms the previous one. This paper is organized as following; in Section 5.2 the basics of Self Organizing Map algorithm are outlined. The basic of incremental clustering and rough set theory approach are described in Section 5.3. In Section 5.4 the essence of genetic algorithm is described. The proposed algorithm is presented in Section 5.5. Section 5.6 is dedicated to experiment results and Section 5.7 provides brief conclusion and future works. In Section 5.8 a brief summary is described. 5.2 SELF ORGANIZING MAP Competitive learning is an adaptive process in which the neurons in a neural network gradually become sensitive to different input categories, sets of samples in a specific domain of the input space. A division of neural nodes emerges in the network to represent different patterns of the inputs after training. The division is enforced by competition among the neurons: when an input x arrives, the neuron that is best able to represent it wins the competition and is allowed to learn it even better. If there exist an ordering between the neurons, i.e. the neurons are located on a discrete lattice, the competitive learning algorithm can be generalized. Not only the winning neuron but also its neighboring neurons on the lattice are allowed to learn, the whole effect is that the final map becomes an ordered map in the input space. This is the essence of the SOM algorithm. The SOM consist of m neurons located on a regular low-dimensional Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 85 grid, usually one or two dimensional. The lattice of the grid is either hexagonal or rectangular. The basic SOM algorithm is iterative. Each neuron i has a d -dimensional feature vector wi = [ wi1 ,..., wid ] . At each training step t , a sample data vector x(t ) is randomly chosen for the training set. Distance between x(t ) and all feature vectors are computed. The winning neuron, denoted by c , is the neuron with the feature vector closest to x(t ) : i ∈ {1,..., m} c = arg min x (t ) − wi , i (1) A set of neighboring nodes of the winning node is denoted as N c . We define hic (t ) as the neighborhood kernel function around the winning neuron c at time t . The neighborhood kernel function is a non-increasing function of time and of the distance of neuron i from the winning neuron c . The kernel can be taken as a Gaussian function: hic (t ) = e − pi − pc 2σ ( t ) 2 2 (2) where pi is the coordinates of neuron i on the output grid and σ (t ) is kernel width. The weight update rule in the sequential SOM algorithm can be written as: ⎧w (t ) + ε (t )hic (t )(x(t ) − wi (t ) )∀i ∈ N c wi (t + 1) = ⎨ i wi (t ) ow ⎩ (3) 86 Advances in Artificial Intelligence Applications Both learning rate ε (t ) and neighborhood σ (t ) decrease monotonically with time. During training, the SOM behaves like a flexible net that folds onto a cloud formed by training data. Because of the neighborhood relations, neighboring neurons are pulled to the same direction, and thus feature vectors of neighboring neurons resemble each other. There are many variants of the SOM (Yan and Yaoguang, 2005; Sap and Mohebi, 2008b). However, these variants are not considered in this paper because the proposed algorithm is based on SOM, but not a new variant of SOM. The 2D map can be easily visualized and thus give people useful information about the input data. The usual way to display the cluster structure of the data is to use a distance matrix, such as U-matrix (Ultsch and Siemon, 1990). U-matrix method displays the SOM grid according to neighboring neurons. Clusters can be identified in low inter-neuron distances and borders are identified in high inter-neuron distances. Another method of visualizing cluster structure is to assign the input data to their nearest neurons. Some neurons then have no input data assigned to them. These neurons can be used as the border of clusters (Zhang and Li, 1993). 5.3 INCREMENTAL CLUSTERING AND ROUGH SET THEORY 5.3.1 Incremental clustering Incremental clustering (Jain et al., 1999) is based on the assumption that it is possible to consider data points one at a time and assign them to existing clusters. Thus, a new Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 87 data item is assigned to a cluster without looking at previously seen patterns. Hence the algorithm scales well with size of data set. It employs a user-specified threshold and one of the patterns as the starting leader (cluster’s leader). At any step, the algorithm assigns the current pattern to the most similar cluster (if the distance between pattern and the cluster’s leader is less or equal than threshold) or the pattern itself may get added as a new leader if its similarity with the current set of leaders does not qualify it to get added to any of the existing clusters. The set of leaders found acts as the prototype set representing the clusters and is used for further decision making. A high-level description of a typical incremental algorithm is as Figure 5.1 (Stahl, 1986). An incremental clustering algorithm for dynamic information processing was presented by Can (1993). The motivation behind this work is that, in dynamic databases, items might get added and deleted over time. These changes should be reflected in the partition generated without significantly affecting the current clusters. This algorithm was used to cluster incrementally a database of 12,684 documents. The quality of a conventional clustering scheme is determined using within group error (Sharma and Werner, 1993) Δ given by: Δ = ∑ ∑ distance m i =1 u h , u k ∈C i (u h , u k ) u h , u k are objects in the same cluster C i . (4) 88 Advances in Artificial Intelligence Applications Incremental_Clustering (Data, Thr){ Cluster_Leader = d1; While (there is unlabeled data){ For (i = 2 to N) If (dis (Cluster_Leader, di) <= Thr) Put di in the same cluster as Cluster_Leader; Else // new Cluster Cluster_Leader = di; }//end of while } Figure 5.1 5.3.2 Incremental clustering algorithm Rough set Incremental clustering This algorithm is a soft clustering method employing rough set theory (Pawlak, 1982). It groups the given data set into a set of overlapping clusters. Each cluster is represented by a lower approximation and an upper approximation ( A(C ), A (C )) for every cluster C ⊆ U . Here U is a set of all objects under exploration. However, the lower and upper approximations of Ci ∈ U are required to follow some of the basic rough set properties such as: 1. 0/ ⊆ A(Ci ) ⊆ A (Ci ) ⊆ U 2. A(Ci ) ∩ A(C j ) = 0/ , i ≠ j 89 Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 3. A(Ci ) ∩ A (C j ) = 0/ , i ≠ j 4. If an object u k ∈ U is not part of any lower approximation, then it must belong to two or more upper approximations. Note that (1)-(4) are not independent. However enumerating them will be helpful in understanding the basic of rough set theory. The lower approximation A(C ) contains all the patterns that definitely belong to the cluster C and the upper approximation A (C ) permits overlap. Since the upper approximation permits overlaps, each set of data points that are shared by a group of clusters define indiscernible set. Thus, the ambiguity in assigning a pattern to a cluster is captured using the upper approximation. Employing rough set theory, the proposed clustering scheme generates soft clusters (clusters with permitted overlap in upper approximation) see Figure 5.2. A high-level description of a rough incremental algorithm is as Figure 5.3 (Lingras et al., 2004). b b bb b a a b b b b b b b b a ba a a b ba b b b b b b b b b b b a a a a a a a a a a a aa a a a a a a a a a a a a aa a a a a Upper Threshold Lower Threshold Figure 5.2 Rough set Incremental clustering 90 Advances in Artificial Intelligence Applications For a rough set clustering scheme and given two objects u h , u k ∈ U we have three distinct possibilities: 1. Both u k and u h are in the same lower approximation A(C ) . 2. Object u k is in lower approximation A(C ) and u h is in the corresponding upper approximation A (C ) , and case 1 is not applicable. 3. Both u k and u h are in the same upper approximation A (C ) , and case 1 and 2 are not applicable. Rough_Incremental (Data, upper_Thr, lower_Thr){ Cluster_Leader = d1; While (there is unlabeled data){ For (i = 2 to N) If (distance(Cluster_Leader, di) <= lower_Thr) Put di in the lower approx of Cluster_Leader; ElseIf (distance(Cluster_Leader,di) <= upper_Thr) Put di in all existing clusters (j=1 to k)that distance(Cluster_Leaderj, di) <= upper_Thr ; Else // new Cluster Cluster_Leader = di; }//end of while } Figure 5.3 Rough set Incremental clustering algorithm Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 91 For these possibilities, three types of equation (4) could be defined as following: Δ1 = Δ2 = Δ3 = ∑ m ∑ ∑ distance ( u h , u k ) ∑ i =1 u h ,u k ∈ A ( X i ) m ∑ ∑ i = 1 u h ∈ A ( X i ) and m distance ( u h , u k ) uk∈ A ( X i ) (5) distance ( u h , u k ) i =1 u h ,u k ∈ A ( X i ) The total error of rough set clustering will then be a weighted sum of these errors: Δtotal= w1 ×Δ1 +w2 ×Δ2 +w3 ×Δ3 wherew1 > w2 > w3. Since Δ1 corresponds to situations where both objects definitely belong to the same cluster, the weight w1 should have the highest value. 5.4 GENETIC ALGORITHM Genetic algorithm was proposed by John Holland in early 1970s, it applies some of natural evolution mechanism such as crossover, mutation, and survival of the fitness to optimization and machine learning. GA provides very efficient search method working on population, and has been applied to many problems of optimization and classification (Goldberg, 1989). 92 Advances in Artificial Intelligence Applications General GA process is as follows: 1. Initial the population of genes. 2. Calculates the fitness of each individual in the population. 3. Reproduce the individual selected to form a new population according to each individual’s fitness. 4. Perform crossover and mutation on the population. 5. Repeat step (2) through (4) until some condition is satisfied. Crossover operation swaps some part of genetic bit string within parents. It emulates just as crossover of genes in real world that descendants are inherited characteristics from both parents. Mutation operation inverts some bits from whole bit string at very low rate. In real world we can see that some mutants come out rarely. Figure 5.4, shows the way of applying crossover and mutation operations to genetic algorithm. Each individual in the population evolves to getting higher fitness generation by generation. Crossover Mutation 0100000010 0100000111 0111000110 × 1110010111 1110010010 Figure 5.4 algorithm Crossover and mutation 0111010110 of Genetic Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 5.5 93 ROUGH CLUSTERING OF THE SOM USING GENETIC ALGORITHM In this paper rectangular grid is used for the SOM. Before training process begins, the input data will be normalized. This will prevent one attribute from overpowering in clustering criterion. The normalization of the new pattern X i = {xi1 ,..., xid } for i = 1,2,..., N is as following: Xi = Xi . Xi (7) Once the training phase of the SOM neural network completed, the output grid of neurons which is now stable to network iteration, will be clustered by applying the rough set algorithm as described in the previous section. The similarity measure used for rough set clustering of neurons is Euclidean distance (the same used for training the SOM). In this proposed method (see Figure 5.5) some neurons, those never mapped any data are excluded from being processed by rough set algorithm. From the rough set algorithm it can be observed that if two neurons are defined as indiscernible (those neurons in the upper approximation of two or more clusters), there is a certain level of similarity they have with respect to the clusters they belong to and that similarity relation has to be symmetric. Thus, the similarity measure must be symmetric. According to the rough set clustering of the SOM, overlapped neurons and respectively overlapped data (those data in the upper approximation) are detected. In the experiments, to calculate errors and uncertainty, the previous equations will be applied to the results of SOM 94 Advances in Artificial Intelligence Applications (clustered and overlapped data). Then for each overlapped neuron a gene is generated that represents the alternative distances from each cluster leader. Figure 5.6, shows an example of the generated genes for m overlapped neurons on n existing cluster leaders. Lower approx Upper approx Rough SOM Figure 5.5 Rough clustering of overlapped neurons are highlighted. the SOM, the After the genes have been generated the genetic algorithm is employed to minimize the following fitness function which represents the total sum of each d j of the related gene: F= ∑∑ g (d m n i =1 j =1 i j) (8) Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis gene 1 gene 2 gene 3 d1 d1 d1 d2 d2 d2 d3 d3 d3 d4 d4 d4 …. …. …. dn-1 dn dn-1 dn dn-1 dn . . . . . …. . gene m d1 d2 d3 d4 …. dn-1 dn 95 . Figure 5.6 Generated genes. m number of overlapped neurons and n is number of existing clusters. The highlighted di is the optimize one that minimize the fitness function The aim of the proposed approach is making the genetic rough set clustering of the SOM to be as precise as possible. Therefore, a precision measure needs to be used for evaluating the quality of the proposed approach. A possible precision measure can be defined as the following equation (Pawlak, 1982): certainty = 5.6 Number of objects in lower approx Total number of objects (9) EXPERIMENTAL RESULTS To demonstrate the effectiveness of the proposed clustering algorithm GR-SOM (Genetic Rough set Incremental clustering of the SOM), two phases of experiments has been done on the well known Iris data set from the UC Irvine Machine Learning Repository Database. The Iris data set which has been widely used in pattern classification. It has 150 data points of four dimensions. 96 Advances in Artificial Intelligence Applications The data are divided into three classes with 50 points each. The first class of Iris plant is linearly separable from the other two. The other two classes are overlapped to some extent. The first phase of experiments, presents the uncertainty that comes from the data set and in the second phase the errors has been generated. The results of GRSOM and Rough set Incremental SOM (RI-SOM) (Mohebi and Sap, 2008b) are compared to Incremental clustering of SOM (I-SOM) (Sap and Mohebi, 2008a). The input data are normalized such that the value of each datum in each dimension lies in [0,1] . For training, SOM 10 × 10 with 100 epochs on the input data is used. The general parameters for the genetic algorithm have been configured as Table 5.1. Table 5.1 General parameters of the genetic algorithm of the experiment results. Population Size Number of Evaluation Crossover Rate Mutation Rate Number of Generation 50 10 0.25 0.001 100 Figure 5.7, shows the certainty generated from epoch 100 to 500 by (9) on the mentioned data set. From the gained certainty it’s obvious that the GR-SOM could efficiently detect the overlapped data that have been mapped by overlapped neurons (Table 5.2). 97 Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis Table 5.2 Certainty-level of I-SOM, RI-SOM and GRSOM on the Iris data set. 100 200 300 400 500 I-SOM 33.33 65.23 76.01 89.47 92.01 RI-SOM 67.07 73.02 81.98 91.23 97.33 GR-SOM 69.45 74.34 83.67 94.49 98.01 Certainty Epoch I‐SOM RI‐SOM 200 300 Epoch GR‐SOM 100 90 80 70 60 50 40 30 20 10 0 100 400 500 Figure 5.7 Generated certainty-level of I-SOM, RISOM and GR-SOM on the Iris data set from epoch 100 to 500. In the second phase, the same initialization for the SOM has been used. The errors that come from the data sets, according to the (5) and (6) have been generated by our proposed algorithms (Table 5.3). The weighted sum (6) has been configured as (10). 98 Advances in Artificial Intelligence Applications ∑w 3 i =1 i =1 (10) and for each wi we have : 1 wi = × (4 − i ). 6 Table 5.3 set. Iris Data set 5.7 Comparative generated error on the Iris data Method GRSOM Δ1 Δ2 Δ3 Δ total 1.05 0.85 0.04 1.4 I-SOM 2.8 CONCLUSION AND FUTURE WORK In this paper a two-level based clustering approach (GRSOM), has been proposed to predict clusters of high dimensional data and to detect the uncertainty that comes from the overlapping data. The approach is based on the rough set theory that employs a soft clustering which can detects overlapped data from the data set and makes clustering as precise as possible, then GA is applied to find the true cluster for each overlapped data. The results of the both phases indicate that GR-SOM is more accurate and generates fewer errors as compare to crisp clustering (ISOM). The proposed algorithm detects accurate overlapping clusters in clustering operations. As the future Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 99 work, the overlapped data also could be assigned correctly to true clusters they belong to, by assigning fuzzy membership value to the indiscernible set of data. Also a weight can be assigned to the data’s dimension to improve the overall accuracy. 5.8 SUMMARY Recently researchers found that to capture the uncertainty involved in cluster analysis, it's not necessary to apply only one threshold to determine the clusters boundaries. In this paper to reduce the uncertainty, a new combination of the Kohonen Self Organizing Map, one of the popular tool for clustering data, and rough set theory and genetic algorithm is proposed. The proposed two-level algorithm, first using SOM to produce the prototypes then applying rough set and genetic algorithm to assign the overlapped data to true clusters they belong to, is found more accurate compared with the proposed crisp clustering algorithm (ISOM) and reduces the errors. 5.9 REFERENCES Asharaf, S., Narasimha, M. M., and Shevade, S. K. (2006). Rough set based incremental clustering of interval data. Pattern Recognition Letters, Vol. 27, pp. 515519. Can, F. (1993). Incremental Clustering for dynamic information peocessing. ACM Trans. Inf. System (11) 2, pp. 143-164. 100 Advances in Artificial Intelligence Applications Goldberg, D. E. (1989). Genetic Algorithm in Search Optimization and Machine Learning. AddisonWesley Pubishing Co.inc. Gray, R. M. (1984). Vector quantization. In IEEE Acoust. Speech, Signal Process. Mag. 1 (2) . pp. 4–29. Han, J. and Kamber, M. (2000). Data mining: concepts and techniques. Morgan-Kaufman, San Francisco. Irvine, U. C. (1987). Machine Learning Repository Database. Available at: http://archive.ics.uci.edu [Accessed 12 April 2008] Jain, A. K., Murty, M. N. and Flynn, P. J. (1999). Data Clustering: A Review. ACM Computing Surveys (31) (3) pp. 264–323. Kohonen, T. (1982). Self-Organized formation of topologically correct feature maps. In Biol. Cybern. 43. pp. 59–69. Kohonen, T. (1997). Self-Organizing Maps. Springer, Berlin, Germany. Lingras, P. J. and West, C. (2004). Interval set clustering of web users with rough K-means. J. Intelligent Inf. Syst. (23) (1) pp.5–16. Mohebi, E. and Sap, M. N. M. (2008). a. Hybrid Self Organizing Map for Overlapping Custers. In Springer-Verlage Proceedings of the CCIS. Hainan Island, China, 2009. Accepted Mohebi, E. and Sap, M. N. M. (2008). b. Rough set Based Clustering of the Self Organizing Map. In IEEE Computer Scociety Proceeding of the 1st Asean Conference on Intelligent Information and Database Systems. Dong Hoi, Vietnam, 2008. Pal, N. R., Bezdek, J. C. and Tsao E. C. K. (1993). Generalized clustering networks and Kohonen’s self-organizing scheme. IEEE Trans. Neural Networks (4), pp. 549–557. Hybridization of SOM and Genetic Algorithm to Detect of Uncertainty in Cluster Analysis 101 Pawlak, Z. (1982). Rough sets. Internat. J. Computer Inf. Sci. (11) pp.341–356. Sap, M. N. M and Mohebi E., a. (2008). A Novel Clustering of the SOM using Rough set. In IEEE Proceeding of the 6th Student Conference on Research and Development. Johor, Malaysia 2008. Accepted Sap, M. N. M. and Mohebi E., b. (2008). Outlier Detection Methodologies: A Review. Journal of Information Technology, UTM, Vol. 20, Issue 1, 2008. pp. 87105. Sharma, S. C. and Werner A. (1981). Improved method of grouping provincewide permanent traffic counters. Transaction Research Report 815, Washington D.C. pp.13-18. Stahl, H. (1986). Cluster analysis of large data sets. In Classification as a Tool of Research. W. Gaul and M. Schader, Eds. Elsevier North-Holland, Inc., New York, NY pp. 423–430. Ultsch, A. and Siemon. H. P. (1990). Kohonen’s self organizing feature maps for exploratory data analysis. Proceedings of the International Neural Network Conference, Dordrecht, Netherlands pp. 305–308. Zhang, X. and Li, Y. (1993). Self-organizing map as a new method for clustering and data analysis”. Proceedings of the International Joint Conference on Neural Networks, Nagoya, Japan. pp. 2448–2451. Yan and Yaoguang, (2005). Research and application of SOM neural network which based on kernel function. Proceeding of ICNN and B’05 (1). pp. 509- 511. 6 A MINING-BASED APPROACH FOR SELECTING BEST RESOURCES NODES ON GRID RESOURCE BROKER Asgarali Bouyer Mohd Noor Md Sap 6.1 INTRODUCTION Nowadays, Grid Computing has been accepted as an infrastructure to perform parallel computing in distributed computational resources (Karl et al., 2001). Grid has users, resources, and an information service (IS). Grid computing is new technology for creating distributed infrastructures and virtual organizations (VOs) for applying a very largescale computing or enterprise applications. In a grid environment, the computational resource is main part of system that can be a Desktop PC, Cluster machine or supercomputer. A main goal of grid computing is enabling applications to identify resources dynamically to create distributed computing environments that can utilize computing resources on demand (Karl et al., 2001). A resource broker is fundamental in any large-scale Grid environment. Since grid resources are geographically A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 103 distributed and heterogeneous, the task of a Grid resource broker and scheduler is to dynamically identify and characterize the available resources, and to select and allocate the most appropriate resources for a given job. In particular, the heterogeneity of resources and lack of ownership are two major issues that must be dealt with when designing a GRB. In a broker-based management system, brokers are responsible for selecting best nods, ensuring the trustworthiness of the service provider. Resource selection is an important issue in a grid environment where a consumer and a service provider are distributed geographically across multiple administrative domains. Choosing the suitable resource for a user job to meet predefined constraints such as deadline, speedup and cost of execution is an important problem in grids. In our approach, we highly have solved some of these problems (Klaus et al., 2002). In this paper we will not do a resource discovery method, but in fact we present a novel way for selecting the best nodes in pool of discovered nodes. Resource selection involves a set of factors including application execution time, available main memory, disk (secondary memory), resource access policies, etc. resource selection must consider information about resource reliability, prediction error probability, and real time execution. However, these various performance measures can be considered under the condition that the middleware allows adaptation of its internal scheduling with desired application’s services. We have considered all of these factors in our approach. Also to reach for better selection we used the Decision Tree with Fuzzy Logic theory (Motohide et al., 1994). Induced decision trees are an extensively-researched solution to classification tasks. The use of Fuzzy Logic techniques 104 Advances in Artificial Intelligence Applications may be relevant in case representation to allow for imprecise and uncertain values in features. The rest of this paper is organized as follows. Section 6.2 overviews previous research on resource brokering and scheduling. Section 6.3 decribe Fuzzy Decision Tree Algorithm in our method. Section 6.4 discuss the system design and implementation details of our OGSI-compliant Grid resource broker service, respectively. Section 6.5 describes experimental results and Section 6.6 concludes the paper. 6.2 RELATED WORKS Many projects, such as DI-GRUBER (Dumitrescu CL and Foster Ian,2005), eNANOS (Ivan et al., 2005), AppLes (Henri et al., 2000) and OGSI- based broker (Seok et al., 2004) have been performed on grid. In this section we introduce some of these brokers. DI-GRUBER (Dumitrescu and Ian, 2005), an extension to the GRUBER brokering framework, was developed as a distributed grid USLA based resource broker that allows multiple decision points to coexist and cooperate in real-time. GRUBER has been implemented in both Globus Toolkit4 (GT4) and Globus Toolkit3 (GT3). The part of DI-GRUBER that dosing resource finding and selecting is called The GRUBER engine. GRUBER engine is the main component of the GRUBER architecture and that implements various algorithms to detect available resources and maintains a generic view of resource utilization in the grid (Dumitrescu and Ian, 2005). A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 105 GRUBER does not itself perform job submission, but it can be used in conjunction with one of various grid job submission infrastructures. The eNANOS Resource Broker is an OGSI-Compliant resource broker developed as a Grid Service and is supported by Globus Toolkit (GT2 and GT3) middleware (Ivan et al., 2005). eNANOS architecture neither uses data mining methods to select the best nodes from the pool of discovered nodes, nor implements in Web Services (WS) bases frameworks. AppLes (Application Level Scheduling) focuses on developing scheduling agents for individual Grid applications (Henri et al., 2000). AppLes agents have an application oriented scheduling mechanism, and use static or dynamic application and resource information to select a set of resources. However, they perform resource discovering and scheduling without considering resource owner policies. Also they do not support system-oriented or extensible scheduling policies. Another resource broker service has been presented by Seok et al. (2004). It is an OGSI- based broker that is supported by GT3. It is a new general purpose OGSIcompliant Grid resource broker service that performs resource discovering and scheduling with close interactions with GT3 Core and Base Services. This resource broker service considers resource owner policies as well as user requirements on the resources. The EZ-Grid project (Barbara et al., 2001) applies Globus services to create Grid resource usage easier and more transparent for the user. This is obtained by developing easy-to-use interfaces coupled with brokerage systems to assist the resource selection and job execution process. 106 Advances in Artificial Intelligence Applications Another works have been done in resource selection field (e.g. Condor/G (James et al., 2001), Nimrod/G, LSF and so forth), but we cannot introduce all of them in this paper. Finally, we mention that none of those systems or brokers uses machine learning methods to select the best nodes for purposed jobs. 6.1 FUZZY DECISION TREE Induced decision trees are an extensively-researched solution to classification tasks. General decision tree always has a deterministic result, and therefore this feature is not good in some application. Thus, if we can use DC with fuzzy logic, we can achieve a better decision. Fuzzy decision Tree (FDT) is the generalization of decision tree in fuzzy environment. The knowledge represented by fuzzy decision tree is closer to the human classification (Christophe and Bernadette, 2003). In our approach we used a Fuzzy decision tree (FDT). 6.1.1 Fuzzy Logic (FL) Essentially, Fuzzy Logic (FL) is a multi-valued logic that allows middle values to be defined between conventional evaluations like yes/no, true/false, black/white, etc. Fuzzy logic is an extension of Boolean logic that replaces binary truth values with degrees of truth. It was introduced in 1965 by Prof. L.Zadeh at the University of California, Berkeley (George and Bo, 1995). The basic notion of fuzzy systems is a fuzzy set. for example, to classify the fuzzy set of climate, which may be consisted of members like “Very A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 107 cold”, “Cold”, “Warm”, “Hot”, and “Very hot”. The theory of fuzzy sets enables us to structure and describe activities and observations, which differ from each other only vaguely, to formulate them in models and to use these models for various purposes - such as problem-solving and decision-making (George and Bo, 1995). We will not discuss fuzzy set such natural extensions here and more about fuzzy logic can be found in (Zadeh, 1984). 6.1.2 Fuzzy Decision Tree Algorithm This algorithm is a developed version of ID3 that operate on fuzzy set and it will produce a fuzzy decision tree (FDT). Before this, other researchers (Motohide et al., 1994; Christophe and Bernadette, 2003) considered the FDT in their applications. Thus, their results showed that this algorithm is suitable for our approach. But there are two important points in making and applying FDT (Janikow, 1998): Select the best attribute in each node to develop the tree: there are many criteria for this aim, but we will use one of them. Inference procedure from FDT. In the classification step for a new sample in FDT, we may encounter many leaf nodes with deferent confidence that offer some classes for purposed sample. Thus, the fitness mechanism selection is important here. Before we express the algorithm, we will consider some assumptions and notation: - The training examples will be called E set with N example. Each example has N properties and every 108 Advances in Artificial Intelligence Applications property Aj contain mj linguistic term and so the number of output class will be as following. Fuzzy terms for - The set of exist examples in t nodes show by X . : represent the degree membership of example x belongs to the class ck. : represent the degree membership of crisp value for attribute j in example x belongs to the fuzzy term in j attribute. Also consider four following formulas: (1) (2) (3) (4) 6.1.2.1 Creating a Fuzzy Decision Tree Step1: Start with all the training examples, having the original weights (degree membership of each sample to desired class is considered 1 value), in the root node. In other words, all training examples are used with their initial weights (This initial weight is not necessarily 1). Step2: if in one of the node t with fuzzy set X one of the below condition is true, that node will consider as a leaf node. A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 109 Con1: for all examples of set X, the proportion for degree membership in a class to sum of degree membership of all data to different classes is equal or greater than θr . Con2: sum of degree membership of all data in set X, less than Threshold θr . Con3: there have not been existed another attribute for selection. Step3: if any conditions of step 2 for desired node is not true, then this node should be developed. Thus: Step3.1: find all attributes in a path from root node to desired node, and then remove it from attribute set. So remaining attribute will be more luck for selection. Step3.2: for every remaining attribute (Ai), we should select an attribute according to Entropy measure (Christophe et al., 2003) to develop the tree ( ). so that, Step3.3: split X set into subsets all elements in , there is a coefficient of fuzzy term for . Step3.4: for every of these subsets, we will define nodes and then the edges are labeled by 110 Advances in Artificial Intelligence Applications values (i=1,2,…, mAmax). Then, the degree membership for each example to new node will be computed as following. Step3.5: exchange each Xi with X and then repeat step 2 and 3. 6.2 SYSTEM ARCHITECTURE We have shown a general architecture for this approach (Figure 6.1). Our supplied application is performed on top of GT3. But it can be applied for GT4. For the nonce, we have provided an isolated application that can be worked based on GT3, for this purpose. The Result of every node is sent in an XML document and is stored in a Temporary XML Database (TXD). 6.2.1 Miner Application To do this, we want to install a Miner Application (MA) for every node in a purposed grid. MA contains an internal small database (in log file role). One of the primary tasks of MA is writing log file. When desired node is connected to grid, MA must update its log file (insert a new record to database) or when a new job is submitted to this node, MA will update the related record, because we want to know the number of jobs that are executed on this node. At the moment, if the job is finished successfully or if the job is failed for any reason, thus, MA will update the log file (there is a Boolean A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 111 field in table that if it is set to TRUE, this means that the related job has been finished successfully, otherwise, it means that the job is not successfully done and has failed). Also, we have considered some new tasks for Grid Resource Broker (GRB), which we have called Optimal GRB. Node1 Log file Node2 Log file Miner‐App Miner‐App Broker Layer Node3 Log file …. Miner‐App Request broker (discovery) TXD Submission Global job queue Resource monitoring Resource selector Fuzzy DT executer Job submission GT middleware Layer Figure 6.1 Selecting nodes for job MDS Job scheduler General architecture for our booking Before selecting any nodes (for aimed job) by GRB, one of these tasks will be executed, this is responsible for sending a packet to each node on grid besides previous tasks. 112 Advances in Artificial Intelligence Applications Needless to say, this task can be executed during recourses discovery operation by GRB. Further, as already stated, there are many different methods to find resources (nodes), but will not concentrate on how we can discovery nodes; and we will not mention them in this paper. Suppose that, there are many different nodes in our grid that are ready for executing jobs and we want to select some nodes in the pool of these nodes. At the beginning, GRB has sent a packet to each connected nodes to our grid. This packet contains some information about a new job (e.g. IP Sender, Size of the job, Size of needed RAM and HDD, average time needed for execution, approximate execution start time, minimum power to CPU, etc.). On the other side, when MA in node gets this packet, it will open the packet for analysis. If there are sufficient resources to do the desired job, MA will perform a data processing technique on its own local database (log file) to obtain some computation for this job. Some of produced results are as follows: • • • • • • • Average Hit Ratio (AHR): This attribute represents an average rate of success in all previous times. Number of all submitted jobs on this node (AAJ). Number of all jobs submitted at this time, on the previous days, on this node (AATPJ). Number of all jobs successfully finished at this time, on the previous days (NSTP). Hit Ratio for this time-period on previous days (HRTP). For example, how many jobs in 1.30 AM o’clock to 2.00 o’clock have been executed? Average Size of successfully finished jobs (ASF). Average Response Time for finished jobs (ART). A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker • • • • • • • • 113 Average Response Time for jobs that have the same size as the purposed job and have been successfully finished (ARTSS). Hit Ratio for the last twenty jobs (HRT). Date, Time and Size of the last successfully finished job (LSJ). Date, Time and Size of the last failed job (LFJ). Size of the largest successfully finished job (LSI). Number of all previous jobs that almost have the same size as the purposed job (ASS). Needless to say, the size of the previous jobs is not exactly the same as the size of the desired jobs. For example, for a job with size=340KB we must find all of the previous jobs between 1K to 500KB size. Number of all previous jobs that have the same size as the purposed job and are successfully finished (NSS). Moreover, processor speed and CPU availablity (Idleness) are important for choosing a node. In addition to the node information, these results will be sent to GRB from any node. There, GRB will analyze them to select/deselect the desired nodes. We mention that always the last collected result will be saved by GRB. 6.2.2 Broker Layer In this layer we have added two new sections beside general broker’s sections. The first section is related to Request Broker section. This section must broadcast packet to all of the nodes in grid, then it must receive and save the sent results from each node in temporary XML database (TXD). Next, Recourse selector section will execute a Fuzzy decision Tree Algorithms on TXD (gathered result). 114 Advances in Artificial Intelligence Applications We are doing this task in sub-section inside Resource Selector that we call FDT executer. Whenever this algorithm has finished its task, the next sub-section, SNJ (Selecting node for job), will use the result of the algorithm to identify suitable nodes. 6.2.2.1 FDT executer. This section is considered for executing FDT algorithm on TXD data. As you know, FDT is a machine learning technique for extracting knowledge that is nearer human decision. In this research, we have used FDT algorithm (FID3), because it is reliable and flexible and also has a high accuracy in selecting samples. All used samples for both training and testing are extracted from the provided database (TXD). After that FDT algorithm was performed by FDT executer, therefore we can select a desired class for purposed jobs. Also, Jobs can be divided in several groups: high reliability jobs, real-time jobs, normal jobs, testing jobs and etc. 6.2.2.2 SNJ sub-layer. Based on the gathered results from FDT executer, this section will select appropriate nodes based on job conditions. There are many parameters in this section, but the main parameters that must be considered, are as follows: Very High Reliability jobs: if we want to execute the desired job successfully with high reliability (response time is not very important), the AHR, HRTP, ASF, HRT measures are very important. There is a priority between these measures. For example, to achieve high reliability, AHR and then HRTP have a high priority. Of course, other measures are also important. SNJ will analyze these measures form gathered results (provided by FDT executer). A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 115 For example, if there are six nodes that have almost the same AHR and HRTP, or ASF and so on, then other measures (e.g. ART or HRT) will select to evaluate the performance of these nodes. It is possible that there are some states in that SNJ cannot select its own nodes without limit. For example, suppose that SNJ needs to select seven nodes for doing the desired tasks, and there exist only five nodes with high reliability (AHR and HRTP over 95%) and also, if there exist other nodes with low reliability (less than 50%), then GRB can use other parameters to decreasing risk. For example, for two remaining nodes, SNJ can consider HRT parameter, because this is better than other Random-based methods. All of this will be done by SNJ. Also it can use multi- versioning in hierarchical architecture to increase reliability (Bouyer Asgarali et al., 2007). In other words, it tries to start the versions through candidate nodes in parallel and distributed form by dispatching some replicas of an offered job to the bestselected nodes with a special order. For example, to perform job1, we can use three nodes in hierarchical form and send replicas of this job to the desired nodes. Thus, when one of these nodes finishes the related job and sends its results to GRB truly, then GRB will send a message to stop and abort this task on other nodes. In this way, fault tolerance will be improved and so, reliability in finishing related task will be increased. Execution in Real Time: if we want to execute a job in real time status, so the CPU speed and ART have highest priority and next priority respectively belong to ARTSS, HRTP, AHR, ASF, and LSI and so on. Also processor’s power and communication line bandwidth are important. In this approach we have concentrated on two kinds of jobs that are mentioned in this section. 116 Advances in Artificial Intelligence Applications For a fuzzy set, the idea of vagueness is introduced by assigning an indicator function that may take on values in the range 0 to 1. The following observations are considered: • • • • • • Count(Si): returned the number of successfully finished jobs on nodei Count(STi): returned the number of successfully finished jobs in the last 20 submitted jobs on nodei Count(AAJi): returned the number of all submitted jobs on nodei Min(ART): return the minimum ART in between of all nodes MAX(ASF): return the maximum ASF in between all nodes Min(CPU_SPi)= return the minimum CPU speed in between all nodes Suppose that 1≤i≤n (n is showing the number of nodes), here we mention how to compute or convert deterministic values to fuzzy sets. Some attributes are have been computed below (member functions) and they are very important to decide on selecting nodes: • • • • • • M( 117 A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker • • • As we can see, the A5 shows the ratio of successful jobs that have similar size with the desired job to all successful finished jobs. A7 shows the CPU power and A8 shows the measure of system Idle in fuzzy range. For the nonce, these nine attributes will be evaluated in fuzzy behavior. We must mention that based on the type of jobs, they will take a weight. This weight has been allocated based on empiric and the effect of each attributed in classification by DT. These weights are representing in Table 6.1. Table 6.1 Name of attributes A1 → A2 → A3 → A4 → A5 → A6 → A7 → A8 → A9 → Assign a weight for each attribut Weight for High reliability WH1=1 WH2=0.4 WH3=0.7 WH4=0.9 WH5=0.5 WH6=0.3 WH7=0.5 WH8=0.4 WH9=0.1 Weight for Real-time Weight for Normal jobs WR1=0.7 WR2=1 WR3=0.6 WR4=0.4 WR5=0.2 WR6=0.1 WH7=0.9 WH8=0.8 WH9=0.2 WN1=0.8 WN2=0.8 WN3=0.6 WN4=0.6 WN5=0.4 WN6=0.2 WH7=0.5 WH8=0.5 WH9=0.1 118 Advances in Artificial Intelligence Applications Now, to find a node with very high reliability rather than other nodes, we should compute the following computation for each node and then we will select that node with maximum value. (8) We will have a similar method for other type of jobs. 6.3 EXPERIMENTAL RESULTS AND DISCUSSION We have designed two applications for our approach. The first application is executed on nodes (MA). The second application is designed to implement a new provider for GRB and will use MA’s result- selects the best nodes for jobs. In our experiments, eight resource computing nodes and one server are used to evaluate performance of this approach. The hardware information has been described in Table 6.2. These nodes communicate with server via internet. Then, the MA application is installed on nodes and broker provider application is installed on the server computer. After that, we have started to obtain the samples. We divide 24 hour into the following parts: 7-9 9-12 12-15 15-17 17-20 20-22 22-24 24-2 2-7 A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker Table 6.2 Name Node1 Node2 Node3 Node4 Node5 Node6 Node7 119 Hardware information Type of hardware Pentium4(Cache1MB),CPU2.2(INTEL), RAM 256 Pentium(Cache2MB),CPU2.4(INTEL), RAM 512 INTEL Pentium,CPU3.0(GLI),2, RAM 1G Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(3.49 GB) Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(3.49 GB) Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(3.49 GB) Intel(R) Core(TM)2 Duo CPU 2.16GHz RAM(2.9 GB) Node8 HP ProLiant ML370 G4 High Performance – Intel Xeon 3.4 GHz (2 processors)L2 cache(RAM 8G) server Pentium4(Cache2MB),CPU3.0(INTEL), RAM 1G When a node is connected to grid (server), right away, MA will insert a new record into node’s log file. In the first six days we have used MA Application but we didn’t use the result of MA in our broker application. Moreover we always have sent a job for all available nodes. After that, we have activated broker provider to select only suitable nodes. Therefore, on the seventh day, we achieved the following results (see Table 6.3) from the available nodes in the morning in order to execute a high reliability job with size 4.47 MB and the execution time of almost 18 minutes. As you see, all of the eight nodes are accessible at this moment. 120 Advances in Artificial Intelligence Applications The computed result in 7.00 to 9.00 o’clock Table 6.3 Node’s Name Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 A1 A2 0.89 0.87 0.9 0.94 0.95 0.96 0.92 0.80 0.9 0.94 0.92 0.95 0.96 0.95 0.93 1.00 A3 A4 A5 A6 A7 A8 A9 0.9 0.9 0.9 0.95 1.00 1.00 0.90 0.85 0.9 0.85 0.85 1.00 0.90 1.00 0.95 1.00 1.00 0.95 1.00 1.00 1.00 1.00 1.00 1.00 0.55 0.80 0.67 0.90 0.85 0.70 0.85 1.00 0.00 0.21 0.39 0.48 0.48 0.48 0.48 0.78 0.54 0.77 0.97 0.87 0.98 0.16 0.80 0.65 0.02 0.04 0.05 0.43 0.39 0.38 0.28 1.00 Table 6.3 shows us, in A2 column, Node8 is the best and Node1 is worst (in fuzzy range). When these results have been delivered to server, broker provider on server side has selected Node4 for this purpose. Then job was sent to this node for execution and after a little time, job finished successfully on Node4 . The priority list nodes for this job were as following (high reliability priority for job): Node4> Node5> Node8> Node7> Node6> Node3> Node2> Node1 If this job has a real-time priority, the below order will be selected by broker provider Application: Node8> Node4> Node5> Node7> Node3> Node2> Node6> Node1 On the following days, all measures were based on broker provider application. After doing 120 measures, we achieved the following results to execute a job with 10.24 MB and 22 minutes for approximate time. The following results were sent by each participated nodes at time 12-15: 121 A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker Table 6.4 The computed result in 12.00 to 15.00 in 21August (job size 10.24 MB at 12:45 o’clock) Node’s Name Node1 Node3 Node5 Node6 Node7 Node8 A1 A2 A3 0.902 0.908 0.961 0.964 0.920 0.810 0.87 0.93 0.96 0.93 0.94 1.00 0.95 1.00 0.95 0.95 0.95 0.9 A4 A5 A6 A7 A8 A9 0.94 0.86 0.89 0.94 0.92 0.89 0.00 0.39 0.48 0.48 0.48 0.78 0.78 0.92 0.96 0.80 0.93 0.75 0.02 0.05 0.39 0.38 0.28 1.00 0.8 0.8 1.0 0.9 0.6 0.8 0.45 0.53 0.89 0.66 0.91 1.00 As you see, there are only 6 nodes available. This job is considered as a Real-time job, thus Node8 was selected as the best node by the proposed application. The selection priority of nodes is as following: Node8> Node5> Node6> Node7> Node3> Node1 Figure 6.2 The ratio of successfully finished jobs in both Random method and our provider If the job has a very high reliability priority, then the selection priority will be as below. Node5> Node8> Node6> Node7> Node3> Node1 122 Advances in Artificial Intelligence Applications In this method we are choosing the best conditions for job. Whereas in other methods (for example, random methods) it is possible to have a high risk to select a node. The Ratio of successful jobs in our methods is compared with another random method (Nader and Mohammadreza, 2006) in Figure 6.2. This shows that our method has a good performance in stable state. Finally, we executed a job with execution time=27min and required RAM= 3.71MB for forty time by our method and also random method (Nader and Mohammadreza, 2006) with 1, 2, 4 nodes in order to speed up evaluation. The obtained results have been represented in Figure 6.3. As you see, if we have more freedom of choice, selection will be exacted and so then failure will be decreased. Figure 6.3 The ration of speedup Random method and our method for a const job The result shows that our approach can achieve better performance under this strategy. After each measure, it seems that the ratio of successfully finished jobs have A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 123 improved. It is memorable that for all jobs smaller than 5MB and approximate time less than 2 minutes, almost all jobs are finished successfully. 6.4 CONCLUSION Instantaneous Resource Selection for scheduling in dynamic fashion in the pool of discovered nodes is a challenging problem. Many methods for this purpose have been presented but all of them have some restrictions. Our proposed approach is learning-based, which can use instantaneous Resource Selection for to increase accuracy in resources scheduling and reduce extra overhead communications and faults in the cycle of selection and computation. This broker provider application along with MA offers a dynamic decision to access any available and appropriate nodes by using main important criteria. The results of our experiments show that this approach has a better performance than others and it will operate according to the user’s requirements. Stability is a helpful characteristic of this approach, so the fault happen is nearly predictable. As mentioned above, this approach has a special accuracy in the selecting resources. Considering this characteristic we recommend this method is useful in cases in which we have sufficient nodes. 124 6.5 Advances in Artificial Intelligence Applications REFERENCES Asgarali, B., Ali, M. and Bahman, A. (2007). A Multi Versioning Scheduling Algorithm for Grid System Based on Hierarchical Architecture. In Proceedings of the 7th IADIS International Conference on WWW/Internet, Vila Real, Portugal. Oct 2007. Barbara, C., Babu, S. and Thyagaraja, K. K. (2001). EZGrid: Integrated Resource Brokerage Services for Computational Grids, http://www.cs.uh.edu/ ezgrid/. Christophe, M. and Bernadette, B.M. (2003). Choice of a method for the construction of fuzzy decision trees. The IEEE International Conference on Fuzzy Systems, Paris. Dumitrescu, C. L. and Ian, F. (2005). GRUBER: A Grid Resource SLA Broker, in Euro-Par, Portugal, September 2005. George, K. J. and Bo, Y. (1995). Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall. Henri, C., Graziano, O., Francine, B. and Richard, W. (2000). The AppLeS parameter sweep template: user-level middleware for the grid. In Proceedings of the ACM/IEEE Conference on Supercomputing, IEEE Computer Society, 2000. Ivan, R., Julita, C. M. Rosa, B. and Jesús, L. (2005). eNANOS Grid Resource Broker, in European Grid Conference (EGC2005), Springer Berlin/Heidelberg. James, F., Todd, T., Miron, L., Ian, F. and Steven, T. (2001). Condor-G: A Computation Management Agent for Multi-Institutional Grids. In Proceedings of the 10th IEEE Symposium on High Performance Distributed Computing (HPDC10), San Francisco, CA. Aug 2001. A Mining Based Approach for Selecting Best Resources Nodes on Grid Resource Broker 125 Janikow, C. Z. (1998). Fuzzy decision trees: Issues and methods. IEEE Transactions on Systems, Man and Cybernetics, vol. 28. Karl, C., Steven, F., Ian, F. and Carl, K. (2001). Grid information services for distributed resource sharing. In 10th IEEE Symposium on High Performance Distributed Computing, San Francisco, California,August 7-9, 2001. Klaus, K., Rajkumar, B. and Muthucumaru, M. (2002). A Taxonomy and Survey of Grid resource Management Systems. Software Practice and Experience, 32(2),135-164 Motohide, U., Hirotaka, O., Hiroyuki, T., Fumio, K., Umedzu, K. and Junichi, K. (1994). Fuzzy decision trees by fuzzy id3 algorithm and its application to diagnosis systems, Department of Systems Engineering and Precision Engineering, Osaka University, Japan, IEEE. Nader, A. and Mohammadreza, M. (2006). A dynamic methods for searching and selecting nodes in peer to peer fashion. 10th conference computer science in Tehran, IKT2006. Seok, K. Y., Lok, Y. J., Gyoon, H. J., Jinsoo, K. and Joonwon, L. (2004). Design and Implementation of an OGSICompliant Grid Broker Service, Proc. of CCGrid. Zadeh, A. L. (1984). Making computer think like people, IEEE spectrum, 26-32 7 RELEVANCE FEEDBACK METHOD FOR CONTENT-BASED IMAGE RETRIEVAL Ali Selamat Pei-Geok Lim 7.1 INTRODUCTION The rapid growth of the computer technologies and the advent of the World-Wide Web have increased the amount and complexity of multimedia information. In general, an image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images (Long et al., 2003). Content based information retrieval (CBIR) uses visual contents to search images from large scale image databases according to users’ interests. The visual contents of an image such as color, shape, texture, and spatial layout have been used to represent and index the image (Long et al., 2003; Crucianu et al.,2004). Recent retrieval systems have incorporated users’ relevance feedback to modify the retrieval process in order to generate perceptually and semantically more meaningful retrieval results. In the basis of computer centric CBIR systems, the visual contents of the images are extracted and described by multidimensional feature vectors. These feature vectors can be used to form a feature database. During the retrieval process, the users provide the retrieval system with the visual feature(s) and users are Relevance Feedback Method for Content Based Image Retrieval 127 required to specify the weights for each of the features. Based on the provided features and specified weights, the retrieval system tries to find the similar images to the user’s query (Rui et al., 1998). The similarities or distance between the feature vectors of query and those of the images in the database are then calculated (Long et al., 2003). This type of retrieval is performed as an indexing scheme which provides an efficient way to search for the image database. The introduction of relevance feedback method in CBIR is aims to solve two main problems of CBIR which are subjectivity of human perception and semantic gap between high level concepts and low level features (Tao et al., 2006). Regarding to the human perception issue, we can say that different persons or same person under different situation may perceive the same visual content differently (Rui et al., 1998). For an example, one person may be more interested in an image’s color feature while another person may be more interested in texture feature on the same image. Even if both people are interested in texture, the way how they perceive the similarity of texture may be quite different (Rui et al., 1998). Therefore, the relevance feedback based CBIR should intend to capture the user preference. Although the relevance feedback has improved the CBIR performance, there appears another problem which the users do not like to label too many images as feedback to system (Qin et al., 2008). This can result a set of limited and inaccurate information being return to the user. In addition, the limited user feedback issue can cause the insufficiency of training samples which is the images that labeled by user during the feedback process (Qin et al., 2008; Tao et al.,2006). Consequently, retrieval accuracy will degrade and influence the performance of CBIR. Additionally, how to incorporate positive and negative examples to refine the query and similarity measure in relevance feedback can be key issues of CBIR (Long et al., 2003). Hence, a 128 Advances in Artificial Intelligence Applications classifier or statistical learning technology is needed to classify the positive samples (relevant images) and the negative samples (irrelevant images) as two difference groups in the feature space (Tao et al., 2006; Hong et al., 2000). According to the classification problem, there are many classification techniques have been apply to attack relevance feedback tasks such as Bayesian inference, Boosting, Support Vector Machines (SVM) and many other statistical learning technologies (Qin et al., 2008). Among these classifier, SVM based relevance feedback is widely employed in CBIR. The SVM classifier is capable to learn from training data which is consists relevant and irrelevant images marked by users (Zhang et al., 2001). SVM classifier has high generalization performance without the need to add a priori knowledge even when the dimension of the input space is very high (Zhang et al., 2001). Moreover, it also has a good performance for pattern classification problems by minimizing the Vapnik-Chervonenkis dimensions and achieving a minimal structural risk (Tao et al., 2006; Hong et al., 2000). Even though SVM intensively improve the performance of relevance feedback-based CBIR, SVM classifier is facing an unstable problem when the size of training samples is small (Tao et al., 2006; Kim et al., 2007). As a result, the performance of SVMbased relevance feedback becomes poor when the number of labelled positive feedback samples is small (Tao et al., 2006; Kim et al., 2007; Qin et al., 2008). Then, the accuracy of SVM relevance feedback based CBIR system to retrieves target images will be decreased. As conclusion, a modified relevance feedback mechanisms need to be developing in order to increase the performance of CBIR system. This chapter consists of 5 sections. Section 1 presents the introduction of the study and Section 2 gives some literature reviews on the relevance feedback and CBIR. Proposed methodology is discussed in section 3 and section 4 discusses the Relevance Feedback Method for Content Based Image Retrieval 129 experimental results and discussion. The last section will explained the conclusion and suggestion for future work. 7.2 RELATED WORK In general, relevance feedback involve the interaction between the human and computer to refine the high level queries to low level feature vectors (Rui et al., 1998). High level queries mean the description that supplied by the user such as keyword or image and it is understand and meaningful for a user. However, low level feature vectors is the low-level features that extracted from the user supplied queries but it is only understand by machines such as computer. In the past, relevance feedback was used in traditional text-based information retrieval system. After that, relevance feedback based approach was introduced to CBIR in order to attack the semantic gap between high level concept and low level features and the human perception subjectivity problems in CBIR. This technique has significantly improved the performance of CBIR. Relevance feedback is a process that automatically adjusting the existing query using the information feedback by the user about the previously retrieved objects (Rui et al., 1998). Therefore, the user is required to mark the retrieved images either relevant or irrelevant to the query and feedback their result to the system. Then, the system will do retrieval again according to the user feedback for the purpose to gain a better result. This process will continues until there is not further improvement in the result or user satisfied with the result. In fact, this technique do not require user to provide an precise initial query, but it can help to estimate the user’s ideal query by using positive (relevant) and negative (non-relevant) training samples feedback by user. Therefore, the relevance feedback based CBIR relies on the relevant and non-relevant examples to reformulate the query. 130 Advances in Artificial Intelligence Applications According to Qi and Chang, relevance feedback technique can be classified into three categories which are query reweighting, query shifting and query expansion (Qi and Chang, 2007). Query re-weighting assigns a new weight to each feature of the query (Qi and Chang, 2007; Rui and Huang, 1999; Qin et al., 2008; Cheng et al., 2008). Moreover, query shifting and also known as query refinement moves the query to a new point in the feature space (Qi and Chang, 2007; Rui et al., 1998; Qin et al., 2008; Cheng et al., 2008). Both query re-weighting and query shifting apply a nearest sampling approach to refine query concept using user’s feedback. Lastly, query expansion uses a multipleinstance sampling approach to learn from the samples around the neighborhood of positive labeled instances (Qi and Chang, 2007). Traditional relevance feedback mechanism and some of other improved relevance feedback mechanisms have been proposed recently and applied to be used in order to improve the CBIR performance. For example, MARS is a learning method that combines the both query vector moving and re-weighting techniques to estimated the ideal query parameter and learn from the user feedback. MARS is capable to perform well while the number of training samples is less than the length of feature vector. However, it does not effective due to its limited ability to model non-linear distance function (Rui et al., 1997). The problem of MARS can be overcome by using Mindreader which can model the non-linear and quadratic function. Although Mindreader has more vigorous estimation process than MARS, it only work well if the number of training samples much more than the length of feature vector (Ishikawa et al., 1998). A novel relevance feedback technique which proposed by Rui and Huang is capable to solve the constrained optimization problem which faced by MARS and Mindreader (Rui and Huang, 1999). It can achieve best performance in all the retrieval condition due to its optimal solutions for query estimation. Relevance Feedback Method for Content Based Image Retrieval 131 However, most of the previous works that have been done still does not achieve a satisfaction level. It is also appear the problems such as the weight adjustment issue, limited user feedback issue, similarity measurement issue and so on. Cheng et al. stated that the traditional relevance feedback system adjusted the weight of each feature (if the visual features of the query is more than one) that extracted from the query image either manually adjusted by the user or predetermined and fixed by the system (Cheng et al., 2008). Therefore, a two-level relevance feedback method has been proposed to let the user to rank the images in a relevance order according to their interest (Cheng et al., 2008). Besides, SCLP (subspance clustering and label propagation) propose two new units which are representative image selection and label propagation to incorporate into the traditional relevance feedback method in order to overcome the problems of insufficient training sample and limited user feedback (Qin et al., 2008). Even though there is abundant of effort have been contributed by the researchers from all over the world, there is still a space to improve the performance of CBIR. 7.3 METHODOLOGY In this section, the proposed methodology will be discussed. Basically, the relevance feedback based CBIR process is started by user who is request to provide an image query to the system. After that, the system will analyst the query image and retrieves a set of ranked images according to a similarity metric. Then, a relevance feedback process will take part where it is one way of identifying what the user looking for in current retrieval session by including the user in the retrieval loops (Crucianu et al., 2004). In relevance feedback process, user will be request to mark the images either relevant or irrelevant to their query. The labeled images will be feedback to the system, and the labeled information will be user to improve the result for next retrieval. Therefore, a revised ranked 132 Advances in Artificial Intelligence Applications list of image will be presented to the user in the next iteration (Das and Ray, 2007). The process of relevance feedback based CBIR will be show in Figure 7.1. Figure 7.1 Process of relevance feedback based CBIR. Relevance Feedback Method for Content Based Image Retrieval Figure 7.2 133 Proposed methodology. The proposed methodology which shows in Figure 7.2 consists of three main parts which are pre-processing, relevance feedback and classification. There are two units which are representative image selection and weight ranking have been added into the traditional 134 Advances in Artificial Intelligence Applications relevance feedback mechanism. Generally, a traditional relevance feedback based CBIR method consists of image retrieval, user labeling and classification units. The added two units are intends to solve some of the problems which are limited information from user, weight adjustment and similarity measurement issues. Representative image selection unit can help the system to selects a set of informative images from image database which can reflect the user interest. However, the weight ranking unit can help the system to determine which feature method, for examples: color or texture, is more concern by user. Hence, the system can adjust the weight of each feature more precisely and indirectly improve the accuracy of CBIR. Further detail of preprocessing, similarity measurement, relevance feedback and classification will be explained in the following sub section. 7.3.2 Preprocessing In this study, both query image and database images will be preprocessed to filter the unnecessary noises. After that, the images will be resized to 128X128 dimensions in order to reduce the complexity computation. Continually, the processed images will be go through two main steps in preprocessing which are image segmentation and feature extraction. This preprocessing step will transform the images which understand by human to another form that can understand by the machines such as computer. Firstly, the image will be segmented into several regions by using image segmentation technique. In this study, the pixel labelling technique based on Gaussian Mixture Models (GMM) which also known as MAP (maximum a posteriori approach) segmentation (Blekas et al., 2005) will be used. By using the MAP segmentation, the image color will cluster and quantized to several classes (Liu et al., 2008). At the end of segmentation, a class-map of a particular image can be obtained (Liu et al., 2008). Figure 7.3 shows the step that the image being segmented into several regions by using MAP segmentation. Relevance Feedback Method for Content Based Image Retrieval 135 As shows in the Figure 7.4, each class label in class-map will be classify as a region. Therefore, there are six regions for six class label in the Figures 7.3 and 7.4. In feature extraction step, the low level images features such as color and texture will be extracted from each image region. For each region, the color features is the average RGB values of all the pixels in that particular region which is also the dominant color feature for that region (Liu et al., 2008). Therefore, there are three dimension of color feature will be extracted for each region. Figure 7.4 shows the color features that have been extracted for each region within the particular image. Figure 7.3 The process of MAP (maximum a posteriori approach) segmentation. In general, the images features selection is a very fundamental issue in designing a content-based image retrieval system. There are no a single feature can perfectly represent the whole content of an image. Therefore, the combination of two or more features is best representing the images content. However, this study will chose two types of the features which is color and texture features to represent the image regions in order to reduce the complexity issues. 136 Advances in Artificial Intelligence Applications After that, texture features for each region will be extracted using haar wavelet filter and Discrete Wavelet Transform (DWT). In this study, the texture features will be extracted from arbitrary-shaped regions (Liu et al., 2008). Figure 7.4 image. Color features for each region within the particular The process of texture feature extraction for each particular region is shows as follows: 1. Scale-down a given 2-D image to four sub images which its wavelets in three orientations which are horizontal, vertical and diagonal (Zhang et al., 2001). 2. Decomposes the image up to four level decomposition using haar wavelet filter and DWT. According to Zhou et al. (Zhou et al., 2000), a three or four level of decomposition is expected to extract more robust features. Relevance Feedback Method for Content Based Image Retrieval 137 3. Calculate the mean, μ mn and standard derivation, σ mn of the transform coefficients in each sub images (Manjunath et al., 2001). The mean and standard variation values will be use as texture feature of the region. Below shows the equation of mean, μ mn and standard derivation, σ mn (Manjunath et al., 2001): μ mn = ∫ ∫ Wmn ( x, y ) dxdy σ mn = (1) ∫ ∫ (Wmn ( x, y ) − μ mn ) dxdy 2 (2) where m = Scales of image n = Orientation of image Wmn ( x, y ) = Energy coefficients of filtered image output 4. Combine the wavelet channel as shows in Figure 7.5 for rotation-invariant and scale-invariant texture criteria. 5. Quantized the coefficients combination channels. from wavelet channels or 6. Calculate the linear sums of channel’s coefficient for rotationinvariant and scale-invariant texture criteria as shows in Figure 7.5. The channel combination descriptor is defined as description of arbitrary combinations of wavelet channel. The main concept of channel combination description is linearly sum up all the channel energies. Besides, it is also assume that it can represent the texture in more compact representation and retain more important information about the texture (Ohm et al., 2000). Figure 7.5 shows the process of extracting the texture features which are mean and standard derivation and the rotation-invariant and scale-invariant 138 Advances in Artificial Intelligence Applications by using channel combination descriptor. Figure 7.5 (a) shows the channel groups for rotation-invariant ( - - - ) and scale-invariant ) criteria. Figure 7.5 (b) shows the process which to combine ( the wavelet channels 10+11+12, 7+8+9, 4+5+6 and 1+2+3 to fulfil the rotation-invariant criteria and wavelet channels 1+4+7+10, 3+6+9+12, and 2+5+8+11 to fulfill the scale-invariant texture criteria. Lastly, the Figure 7.5 (c) shows quantized and up sampling the channel’s energies. There are 31-dimension of texture feature have been extracted from each region. As a result, there is a total of 34-dimension feature for each region will be extracted. All extracted features will be used for feature similarity measure. 7.3.3 Feature Similarity Measure This sub section will measure the similarities among images by comparing the differences between the features. In CBIR, the features such as color and texture feature represent the content of an image. Then, feature can used to showed how similar between two images. Hence, the similarity comparison is performed based on the visual content descriptors which are DWT and RGB feature extraction technique. Besides, it is a needed to define a suitable dissimilarity measure formula to compute the region-based images (Greenspan et al., 2000; Liu et al., 2008). In this study, Earth Mover’s Distance (EMD) will be used as feature similarity measurement. EMD is used to compute the dissimilarity between sets of regions and returns the correspondence between them (Greenspan et al., 2000). IQ = {(R ) } {(R , w )i = 1,..., n} , Given a query image I Q with m segmented regions as i Q , wQi i = 1,..., m and target image I T with n segmented regions as I T = i T i T where RQi and RTj are ith and jth region of the query image and target image. Meanwhile, Relevance Feedback Method for Content Based Image Retrieval 139 Figure 7.5 Texture features for each region within the particular image (Ohm et al., 2000). 140 Advances in Artificial Intelligence Applications wQi and wTj are the weight of the regions. For the iteration of image retrieval, the weights of regions are defined as ratio size of the region size to the image size (Liu et al., 2008). The initial 1 weight of color and texture feature is where N is the total of N image feature. Below shows the formula of EMD distance measurement: ∑ ∑ ∑ ∑ EMD(I Q , I T ) = m i =1 m i =1 n j =1 ij n v d ij (3) v j =1 ij where d ij is the ground distance between region RQi and RTj . It is { {R , G , B , μ } also Euclidean distance which calculate the dissimilarities between the region features, i i i i i i and FQi = RQi , GQi , BQi , μ 00 ,..., , c ,..., c σ μ σ 32 Q 32 Q 7Q Q 00 Q FTj = d ij = d FQi , FT j j T ( j T (R = wc + wt j T ) i Q − RTj 3 2 σ 00j T ,..., μ 32j T σ 32j T , c j ,..., c7jT }. 1Q j 00T 1T ) + (G 2 i Q ∑∑ [e] + ∑ (c − GTj 7 s =0 o =0 k =1 i kQ ) + (B −c 2 i Q ) 2 j kT where wc = weight of color feature. wt = weight of texture feature. ( i j − μ soT e = μ soQ ) + (σ 2 i soQ j − σ soT ) 2 − BTj ) 2 (4) Relevance Feedback Method for Content Based Image Retrieval 141 It will be determined by minimizing EMD(I Q , I T ) subject to the following constraints: (Liu et al., 2008) vij which stated in formula EMD is the weight that assigned to d ij . ∑v ≥ 0; vij j =1 m ij ≤ wQi , 1 ≤ i ≤ m; i =1 m ij ≤ wTj , 1 ≤ j ≤ n; n ∑v ∑∑ v n i =1 j =1 ij n ⎛ m ⎞ = min⎜⎜ ∑ wQi , ∑ wTj ⎟⎟ j =1 ⎝ i =1 ⎠ (5) The above constraints shows that the minimum weight among both of the query region and target region will be taken as weight that assigned to d ij which is calculating the distance between region features. At the end of feature similarity measure, a ranked list of EMD distance between database images and query image will be generated. Ideally, the comparison between region features can help the system to retrieve the images which contain the same concept as the query image. Therefore, the smaller distance between regions of query image and target image shows the closer concept between them. 7.3.4 Relevance Feedback As mentioned above, relevance feedback mechanism will be repeated iteratively until the retrieval result is satisfied by the user. In relevance feedback parts, it consists of five units which including image retrieval, representative images selection, user labeling, weight ranking, and SVM learning and classification. 142 Advances in Artificial Intelligence Applications 7.3.4.1 Image Retrieval A ranked EMD distance list of database images is the input for this unit. However, there are two different types of outputs from this unit. In the beginning of image retrieval, top N images from the ranked list will be displayed for user labeling process where N is amount of images that needed to output for user to mark. For the next iteration, a set of estimated possibly positive image set (EPPIS) will be selection and outputted. EPPIS set is a dynamic image collection which is sub image set that consistent characteristic. Moreover, EPPIS contains those images which are closest to query point in the beginning of retrieval. Thus, those images which are classified as positive with high confidence will be included in EPPIS. This is done for the sake of the proposed new units which is representative image selection (Qin et al., 2008). 7.3.4.2 Representative Image Selection Representative image selection unit is aims to select a set of representative image from the EPPIS set and retrieve these selected representative image for user labeling process. As a result, the total retrieved images for user labeling is the total amount for the representative image set. A representative image is a subset of EPPIS which have the minimum information loss and do not contain too much redundancy. Below shows the definition of the representative image set (Qin et al., 2008): R = arg min Y {d (EPPIS ,Y )Y ⊆ EPPIS , N Y = NR } (6) where N Y = Number of elements in Y N R = Number of elements in representative image set R Basically, all the images including the user query image and database images will be partition into the perceptive of features which are regions and its color and texture features. In Relevance Feedback Method for Content Based Image Retrieval 143 representative image selection, the informative image will be selected from the EPPIS set which the selected images has the smallest distance to the query image. Figure 7.6 shows the algorithms of representative image selection. Before that, there are few assumptions need to be considered: 1. There is possible for a region such as region 1 in query image, I Q has the same concept as in the region 2 in target image, I T1 and region 1 in the target image, I T2 . Therefore, each region in query image needs to be compared to every region of all target images from EPPIS set. 2. By using the Euclidean measurement for region features, the target image which has the smallest distance of region features will be included in representative image set. For example, if target image, I T1 which its color feature of region 3 having the smallest distance to color feature of region 1 for query image compare to other images in EPPIS, then, the target image, I T1 will be taken as representative image. 3. If we select the representative images from EPPIS independently based on the concept as stated as sentence 2 above, there will appear the case which the same representative image have been selected from two different region of query image. In order to avoid this problem, the element-element distance, d (m ) will be used to check whether the new selected representative image is close enough to any representative images in R (m ) where m is the current total of representative images. If it is close, it will be deleting and the selection of representative image in that query region will repeat again. As result, the second smaller distance of target image to the query image will be selected as representative image. Furthermore, this process will be continued until the selection of representative image set becomes stable. 144 Advances in Artificial Intelligence Applications Figure 7.6 The algorithm of the representative image selection. Relevance Feedback Method for Content Based Image Retrieval 145 4. Each region and its corresponding features have been assigned to their particular weight. It is mean that each of the regions was not equally important. Thus, the selection of representative images will depend on the assigned weight. As example, suppose we have two regions with weight 0.6 and 0.4 separately, and we need to select 10 representative images from EPPIS. Therefore, 0.6X10=6 images need to select from region 1 and 0.4X10=4 images need to select from region 2 of query image. 7.3.4.1 User Labeling In user labeling process, the user will mark the retrieved representative images either relevant or irrelevant according to their desired target images. The labels will be feedback to the system for further analysis. In this study, user is require to give a sequence of images which known as image ranking sequence and this sequence will be feedback to the CBIR system. Similarly, user will rank a sequence of relevant images with respect to their similarity to query image (Cheng et al., 2008). In user labeling unit, there are two type of user-defined preference relation which are > and = will be used by user to show the image ranking sequence. As example, if the similarity degree of I T1 and I T2 are same, then it will denote I T1 = I T2 . Meanwhile, if the similarity degree of I T1 is more than I T2 , then it will denote I T1 > I T2 . In the other words, the leftmost to rightmost of image ranking sequence showing the most similar to less similar to the user desired image. According to Cheng et al. (2008), there are some of criteria shows that this image ranking sequence will give more meaningful feedback to CBIR system. These criteria are: 1. User able to provide more information about the relevant image by using image ranking sequence. Therefore, we can get more information with the same number of judged examples. 146 Advances in Artificial Intelligence Applications As a result, the iterations of relevance feedback process can be reduced (Cheng et al., 2008). 2. User can easily weight the various features which are regions and region features based on their interest. It is difficult for user to define the value of similarity, but user can easily distinguish which images are more similar than the others (Cheng et al., 2008). Thus, user will rank the similarity degree for each relevant image in image ranking sequence. 3. From the user image ranking sequence, we can estimate which feature methods are closer to user’s interest. After that, we can analyze how each feature method is close to image ranking sequence that responded by user. The weight of each feature will be adjusted according to the analyst of each feature method (Cheng et al., 2008). 7.3.4.1 Weight Ranking In weight ranking unit, the CBIR system will obtain the label for each of the representative image that has been retrieved in user labeling process. The images that return in the image ranking sequence will be labeled as relevant images and the rest will be label as irrelevant images. After that, system will analysis from the relevant images and find the region areas that similar in each of them. Next, system will calculate and update the weights for region and region features by using the formula as shows at below. For the region features which are color and texture feature, the weight adjustment process will use the following formula 7 (Qin et al., 2008). For any given query images, their corresponding weight of region features is F = {wc , wt } where wc is weight for color feature and wt is weight of texture feature. Relevance Feedback Method for Content Based Image Retrieval wf = γ (f ) 147 1 ∑γ ( ) F k =1 (7) 1 k where γ ( f ) = ∑ x∈L , y∈L (d ( f ) ( x, y )) and d ( f ) is the element-element 2 distance metric for region features among those relevant labeled images. Each region of images will be rank in the ranking sequence by system according to their distance to the query image. Therefore, the ranking sequence generated by a method is closer to ranking sequence responded by the user if its feature is closer to user’s opinion. Let assume, if the ranking sequence for region 1 is i4> i3> i2> i1 and ranking sequence for region 2 is i1> i2> i4> i3. Ranking sequence responded by user is i1> i2> i3> i4 where i is the index of database images. Therefore, the weight of region 1 will decrease and weight for region 2 will increase during weight adjustment process. According to Cheng et al. (Cheng et al., 2008), the importance of each region is based on the comparison of sequence. For adjust the weight of regions, the formula 8 will be used to evaluate how close of two sequence (Cheng et al., 2008). ( Rnorm Δsystem , Δuser ) = 1⎛ S+ − S− ⎜1 + + 2 ⎜⎝ S max ⎞ ⎟⎟ ⎠ (8) where Δsystem = rank ordering of labeled relevant images induced by the similarity values computed by an image retrieval system. Δuser = rank ordering of labeled relevant images by user. 148 Advances in Artificial Intelligence Applications S + = number of image pairs where a better image is ranked ahead of a worse one by Δsystem . S − = number of image pairs where a worse image is ranked ahead of a better one by Δsystem . + S max = maximum possible number of S + from Δuser According to Cheng et al. (Cheng et al., 2008), the Rnorm calculation method is based on the ranking of images pairs in Δsystem relative to ranking of corresponding images pairs in Δuser . The range of Rnorm values is from 0 to 1 and value 1 indicates that the rank ordering of system is same as user provided ranking (Cheng et al., 2008). As a result, Rnorm can represents which part of region in the query image that user pay attention. For a given query image I Q with m segmented regions, the system will estimate Rnorm of each region, Rnorm = (r1 , r2 ,..., rm ) . The new weight for each region will defined in formula 9. wQi = ∑ j ==1 r j ri m (9) 7.3.4.2 Support Vector Machine (SVM) Learning and Classification Support Vector Machine (SVM) has been use to classify the unlabeled images of image database into two classes which are relevant and irrelevant. First of all, the user labeled images will be used to train the SVM learning machine. After that, the entire database images that not yet label by user will be used in SVM classification process. In this study, two-class SVM will used to do classification purpose. The images which classified as relevant images will be retrieved for next iteration. The feature distance Relevance Feedback Method for Content Based Image Retrieval 149 measurement information will be used as index values of samples image in training and classification process. The features distance measurement information is the features distance of labeled or unlabeled sample image to query image. The relevance feedback mechanism will be repeated iteratively until the retrieval result is satisfied by the user. In other words, system has successfully formulated the user interest area and correctly re-weighting the weight of each region and its region features. 7.4 EXPERIMENT SETUP In this section, we will describe the experiment and result in details. In this study, the experiment used five categories of database images which are animal, building, flower, fruits and natural scene. The image database has total 1000 images and each category consist of 200 images. Table 7.1 shows the categories of image database. The main purpose of these dataset are used for evaluate the performance of proposed relevance feedback based CBIR system. Table 7.2 shows the hardware specification that will used in the experiment and Table 7.3 shows the default parameters that will be used in the experiments. In this study, the information retrieval measurement which are precision, recall and F1 will be used to evaluate the performance of CBIR. Precision (P), recall (R) and F1 are three standard measurements that commonly used to evaluate the quality of the results in information retrieval domain. Basically the value of precision measures the exactness of the system performance whereas recall measures the completeness of the data that being retrieved by the system. F1 is the measurement that evaluates the balance of precision and recall. The formula of these three standard information retrieval measurements is shows as below. The detail 150 Advances in Artificial Intelligence Applications of explanation for following formulas will be shown in the Table 7.4. Table 7.1 Categories of image database. Category 1 2 3 4 5 Category Name Animal Building Flower Fruits Natural Scene Table 7.2 Amount 200 200 200 200 200 Hardware specification that used in experiment. No Hardware 1 Processor 2 Memory (SDRAM 667Mhz) 3 Operating System Specification Intel Centrino Dual Core DDRII 2GB MS Windows Vista Table 7.3 Default parameters that will be used in the experiments. Condition Number of retrieved image Number of EPPIS set Number of iteration Number of training sample for SVM Number of classification samples for SVM Type of SVM Parameter 20 images 100 images 6 iteration Entire labeled images. Unlabeled images = Entire database images – entire labeled images. Binary SVM with default parameter setting for linear kernel. Relevance Feedback Method for Content Based Image Retrieval Table 7.4 151 Description of information retrieval measurement Human Yes No System a b Yes No precision(P ) = recall (R ) = F1 = c a (a + b ) a (a + c ) 2 PR (P + R ) d (10) (11) (12) 7.4.2 Experiment Result and Discussion In this sub section, the experiment results will be discussed. Figures 7.7 to 7.9 shows the precision, recall and F1 results of proposed relevance feedback method for five categories of image database which are animal, building, flower, fruits and natural scene. However, Figure 7.10 shows the average performance of proposed relevance feedback based CBIR for five categories of image database. According to Figure 7.7, the fruits category shows the highest precision in almost all the retrieval iteration. However, the fruits categories shows decreasing pattern from first iteration to the sixth iteration. This may be due to there is several categories of fruits such as apple. Orange, banana, and so on have been collected under the fruits category. Therefore, the results of fruits category 152 Advances in Artificial Intelligence Applications become unstable. The flower category shows an increasing pattern from first to sixth iteration. This is because the flower images that have been collect having the similar and consistent behavior. Precision Rate for Five Categories of Images 6 Iteration 5 Natural Scene Fruits Flower Building Animal 4 3 2 1 0 0.2 0.4 0.6 Precision Rate 0.8 1 Figure 7.7 Precision rate for five categories of image using the proposed relevance feedback method. Obviously, the recall and f1 rate shows an increasing pattern for all of the categories of image database which can be illustrated in Figures 7.8 and 7.9. At the same time, the flower and fruits categories shows higher rate among these five categories. The main cause is almost all the images under category fruits and flower having the consistent behavior and it does not contain the unnecessary noise especially the unwanted background and objects that does not describe the category. According to Figure 7.10, the experiment results indicate that the recall and F1 rate of proposed method is increasing from first retrieval to sixth iteration. Hence, it could be conclude that the proposed relevance feedback method has achieved a better performance after sixth iteration. The increasing pattern from first Relevance Feedback Method for Content Based Image Retrieval 153 to sixth iteration as shown in Figure 7.10 can conclude that the proposed relevance feedback based CBIR system is capable to select the positive images from the set database images. As a result, the representative image selection unit is able to choose the significant and informative positive images from database images rather than just follow the result of similarity measurement. Besides, the experiment results also show that the weight ranking unit is capable to re-weight the weight of features more precisely in a more proper way. Hence, the proposed method is able to retrieve more relevant images against the increasing of relevance feedback iterations. Lastly, the experiments also show that the incorporation of representative image selection and weight ranking units is capable to increase the performance of CBIR system. Recall Rate for Five Categories of Images 6 Iteration 5 Natural Scene Fruits Flower Building Animal 4 3 2 1 0 0.1 0.2 0.3 Recall Rate 0.4 0.5 Figure 7.8 Recall rate for five categories of image using the proposed relevance feedback method. 154 Advances in Artificial Intelligence Applications F1 Rate for Five Categories of Images 6 Iteration 5 Natural Scene Fruits Flower Building Animal 4 3 2 1 0 0.1 0.2 0.3 F1 Rate 0.4 0.5 0.6 Figure 7.9 F1 rate for five categories of image using the proposed relevance feedback method. Average Rate of Information Retrieval Measurement for Proposed Method 0.6 0.5 0.4 Rate 0.3 0.2 0.1 0 Precision Recall F1 1 2 3 4 5 6 Iteration Figure 7.10 Average precision, recall and F1 rate for five categories of image database. Relevance Feedback Method for Content Based Image Retrieval 7.5 155 CONCLUSION Based upon the experiment results, it show that the proposed method achieve better performance after sixth iteration of retrieval. The experiments also proof that by solving the limited user feedback and weight adjustment issue, the performance of CBIR could be improved. Hence, the proposed method is capable to solve the CBIR problems. However, the performance of proposed method still does not achieve a outperform result. Therefore, a lot of efforts still need to put in order to rapidly increase the performance of CIBR and reduce the number of iteration that need to use in relevance feedback process. There is potential to have improvement in performance if better preprocessing method is applied instead of only using using MAP segmentation (Blekas et al., 2005), haar filter (Smith and Chang, 1996; Manjunath et al., 2001) with DWT and wavelet channel combination (Ohm et al., 2000) technique. Therefore, preprocessing method could be considered as one of the future study since the performance of CBIR may vary if different preprocessing methods are applied. Thus, there will have more significant data could be extracted if the more significant feature extraction technique could be used. In conclusion, the performance of proposed method may be further improved if more feature extraction method such as such as shape, spatial layout and geometry features is being used. 156 7.6 Advances in Artificial Intelligence Applications REFERENCES Blekas, K., Likas, A., Galatsanos, N. and Lagaris, I. (2005). A Spatially-Constrained Mixture Model for Image Segmentation. IEEE Transactions on Neural Networks, vol. 16(2), 494-498. Cheng, P. C., Chien, B. C., Ke, H. R. and Yang, W. P. (2008). A two-level relevance feedback mechanism for image retrieval. Expert Systems with Applications, Vol. 34, Issue 3, 2193-2200. Crucianu, M., Ferecatu, M. And Boujemaa, N. (2004). Relevance feedback for image retrieval: a short survey. Report of the DELOS2 European Network of Excellence (6th Framework Programme). Das, G. and Ray, S. (2007). A comparison of relevance feedback strategies in CBIR. Computer and Information Science, 2007. ICIS 2007. 6th IEEE/ACIS International Conference, 100-105. Greenspan, H., Dvir, G. and Rubner, Y. (2000). Region Correspondence for Image Matching via EMD Flow. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries (Cbaivl'00) (June 16 16, 2000). IEEE Computer Society, Washington, DC, 27. Hong, P. Y., Tian, Q. and Huang, T. S. (2000). Incorporate support vector machines to content-based image retrieval with relevance feedback. Proceedings 2000 International Conference Image processing, Vol. 3, 750-753. Ishikawa, Y., Subramanya, R. and Faloutsos, C. (1998). Mindreader: Query database through multiple examples. Proceedings of the 24th VLDM Conference, New York. Kim, D. H., Song, J. W., Lee, J. H. and Choi, B. G. (2007). Support Vector Machine Learning for region-based image retrieval with Relevance Feedback. ETRI Journal. Vol. 29, Number 5, 700-702 Relevance Feedback Method for Content Based Image Retrieval 157 Liu, Y., Zhang, D. S. and Lu, G. (2008). Region-Based Image Retrieval with High-Level Semantics using Decision Tree Learning. Pattern Recognition, 41(8), 2554-2570. Long, F., Zhang, H. J. and Feng, D. D. (2003). Multimedia Information Retrieval and Management: Technological Fundamentals and Applications, chapter Fundamentals of Content-Based Image Retrieval. Springer-Verlag, Berlin. Manjunath, B., Wu, P., Newsam, S. and Shin, H. (2001). A texture descriptor for browsing and similarity retrieval. Signal Processing Image Communication, 16:33-43. Ohm, J., Bunjamin, F., Liebsch, W., Makai, B., Müller, K., Smolic, A. and Zier, D. (2000). A Set of Visual Feature Descriptors and their Combination in a Low-Level Signal Processing: Image Description Scheme. Communication 16, 157-179. Qin, T., Zhang, X. D., Liu, T. Y., Wang, D. S., Ma, W. Y. and Zhang, H. J. (2008). An active feedback framework for image retrieval. Pattern Recognition Letters, Vol. 29, Issue 5, 637-6461. Qi, X. J. and Chang, R. (2007). Image retrieval using transactionbased and SVM-based learning in relevance feedback session. M. Kamel and A. Campilho (Eds): ICIAR 2007, LNCS 4633, 638-649 @ Springer-Verge Berlin Heidelberg. Rui, Y. and Huang, T. S. (1999). A novel relevance feedback techniques in Image retrieval. In: Proceedings 7th ACM Conference on Multimedia, 67–70. Rui, Y., Huang, T. S. and Mehrotra, S. (1997). Content-based image retrieval with relevance feedback in MARS. Proceedings IEEE International Conference on Image Processing, Vol. 2, 815-818. Rui, Y., Huang, T. S., Ortega, M. and Methrotra, S. (1998). Relevance feedback: A power tool in interactive contentbased image retrieval. IEEE Transactions On Circuits and Systems for Video Techonology, 8(5), 644-655. Smith, J. R. and Chang, S. F. (1996). Automated binary texture feature sets for image retrieval. Proceedings ICASSP-96. 158 Advances in Artificial Intelligence Applications Tao, D. C., Tang, X. O., Li, X. L. and Wu, X. D. (2006) .Asymmetric Bagging and Random Space for Support Vector Machines-Based Relevance Feedback in Image Retrieval. IEEE Transactions Pattern Analysis and Machine Intelligence, vol.28, n07. Zhang, L., Lin, F. and Zhang, B. (2001). Support Vector Machine Proceedings IEEE Learning for Image Retrieval. International Conference Image Processing, vol. 2, 721724. Zhou, S. H., Venkatesh, Y. V. and Ko, C.C. (2000). Wavelet-based Texture Retrieval and Modeling Visual Texture Perception. Master of Engineering Thesis. National University of Singapore (NUS). 8 DETECTING BREAST CANCER USING TEXTURE FEATURES AND SUPPORT VECTOR MACHINE Al Mutaz Abdalla Safaai Deris Nazar Zaki 8.1 INTRODUCTION Localized textural analysis of breast tissue on mammograms has recently gained considerable attention by researchers studying breast cancer detection. Despite the research progress to solve the problem, detecting breast cancer based on textural features has not been investigated in depth. In this work, we study the breast cancer detection based on statistical texture features using Support Vector Machine (SVM). A set of textural features was applied to a set of 120 digital mammographic images, from the Digital Database for Screening Mammography. These features are then used in conjunction with SVMs to detect the breast cancer tumor. 8.2 RELATED WORK Breast cancer is the second causes of death for women around the world. Any average woman has one chance in eight (or about 12%) of developing breast cancer during her life. Early detection of 160 Advances in Artificial Intelligence Applications breast cancer by means of screening mammography has been established as an effective way to reduce the mortality rate resulting from breast cancer (Smith, 1995; Tabar,1995). Despite significant recent progress, the recognition of suspicious abnormalities in digital mammograms still remains a difficult task. There are at least several reasons for that. First, mammography provides relatively low contrast images, especially in the case of dense or heavy breasts. Second, symptoms of abnormal tissue may remain quite subtle. For example, speculated masses that may indicate a malignant tissue within the breast are often difficult to detect, especially at the early stage of development (Arodz, 2005). The recent use of localized textural and machine learning (ML) classifiers has established a new research direction to detect breast cancer. Localized texture analysis of breast tissue on mammograms remains an issue of major importance in mass characterization. However, in contrast to other mammographic diagnostic approaches, it has not been investigated in depth, due to its inherent difficulty and fuzziness (Mavroforakis, 2006). Many studies have been focused on general issue of textural analysis on mammographic images, in the context of detection of the boundary of tumors and micro calcifications (Lisboaa, 2002; Arivazhagan, 2003). However, none of these studies has taken in consideration the classification of normal, benign, and malignant cases all together. In this work, we study the classification of a total of 120 digital mammographic images contain 60 normal, 30 benign, and 30 malignant cases. 8.3 MATERIALS AND METHOD A set of statistical texture feature functions was applied to a set of 120 digitized mammograms in specified regions of interest. The Detecting Breast Cancer Using Texture Features and Support Vector Machine 161 measurements are made on co-occurrence matrices in two different directions. In the first step the digitized sample consists of 120 mammographic images originating from the Digital Database for Screening Mammography (DDSM) (Kwok, 2003) has been randomly selected (similar data will be used to test the performance of different machine learning techniques). The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The database contains approximately 2,500 studies. Each study includes two images of each breast, along with some associated patient information. One hundred twenty cases were selected randomly from the DDSM with different age groups ranged from (35-80) to form this dataset. Figure 8.1 segmented Digitized Mammogram showing one manually The selected cases contain 60 normal, 30 benign, and 30 malignant digitized cases at 50 micron meter and 12 bit gray level. In the second stage the region of interest has been selected which contain the suspicious Region of Interest (ROI) as shown in Figure 8.1. In 162 Advances in Artificial Intelligence Applications the third stage, the feature selected from the ROI and statistical texture features are calculated for each ROI. In Figure 8.2, we illustrate the main steps to classify the extracted texture. Figure 8.2 Diagram showing the main steps in texture analysis and classification Detecting Breast Cancer Using Texture Features and Support Vector Machine 8.4 163 FEATURE EXTRACTION The implementation of feature extraction procedure relies on the texture, which is the main descriptor for all the mammograms. In this work, we concentrate on statistical descriptors that include variance, skewness, and Spatial Gray Level Dependence Matrix (SGLD) or co-occurrence matrix for texture description. These features are then used in conjunction with SVM to separate the three classes from each other. In the proceeding sub-sections, we will illustrate how these statistical descriptors are actually calculated. 8.4.1 Gray Level Histogram Moments (GLHM) Two gray level sensitive histogram moments are extracted from the pixel value histogram of each region of interest (ROI) and defined as follows: Variance: σ2 = X Y 1 ∑∑ [I ( x, y) − μ ] ( XY − 1) x =1 y =1 2 Skewness: 1 XY ⎡ I ( x, y ) − μ ⎤ ∑∑ ⎥⎦ ⎢ σ x =1 y =1 ⎣ X Y (1) 3 (2) Where X and Y denote the number of gray levels in the mammogram. I ( x, y ) denote the image sub-region pixel matrix, and µ is the mean of the matrix I ( x, y ) . 164 8.4.2 Advances in Artificial Intelligence Applications Spatial Gray Level Dependence Matrix (SGLDM) The co-occurrence probability provides a second order method for generating texture features. This probability is called in another terminology as Spatial Gray Level Dependency (SGLD) and it represents the conditional joint probabilities of all pairwise combinations of gray levels in the spatial window of interest given two parameters interpixel distance (d) and orientation angle ( φ ). The probability measure can be defined as: ⎧ C ⎫ Pr ( x) = ⎨ ij ⎬ ⎩ (d ,φ ⎭ (3) Where Cij is the co-occurrence probability between gray levels i and j . Texture features based on the spatial co-occurrence of pixel values are probably the most widely used in the analysis of digital image. First proposed by Haralick et al. (1973) the characterize texture using a varity of quantities derived from second order image statistics. Statistics are applied to the Gray Level Cooccurrence Matrix (GLCM) to generate texture features which are assigned to the center pixel of the image window. In this work, five statistical measures are extracted using the following nomenclature where the element p (i, j ) represents the frequency of occurrence between two gray levels, i and j for a given vector. Angular second moment (ASM): Measure the number of repeated pairs ∑∑ p(i, j ) N N i =1 j =1 2 (4) Inverse difference moment: Inform about the smoothness of the image Detecting Breast Cancer Using Texture Features and Support Vector Machine ∑∑ ⎢1 + ( j − j ) N N i =1 j =1 ⎡ ⎣ 1 2 165 ⎤ p (i, j )⎥ ⎦ (5) Entropy: Measure the randomness of a gray level distribution ∑∑ [ p(i, j ) log( p(i, j ))] N N i =1 j =1 (6) Correlation: ⎡ ⎤ ⎢∑∑ (ij ) p(i, j )⎥ − μ x μ y ⎣ ⎦ σ xσ y (7) Where μ x , μ y , σ x , σ y are the means and standard deviations of the density function. The probability p (i, j ) provides a correlation between the two pixels in the pixel pair Contrast: ⎧⎪ N g N g ⎫⎪ n ⎨∑∑ p(i, j )⎬ ∑ ⎪⎩ i =1 j =1 ⎪⎭ n =0 N g −1 2 8.5 (8) SUPPORT VECTOR MACHINE SVM is a powerful classification algorithm and well suited the given task (Cristianini and Tylor, 2002; Vapnik,1998). It addresses the general problem of learning to discriminate between positive and negative members of a given class of n -dimensional vectors. 166 Advances in Artificial Intelligence Applications The algorithm operates by mapping the given training set into a possibly high-dimensional feature space and attempting to learn a separating hyperplane between the positive and the negative examples for possible maximization of the margin between them (Zaki, Deris, and Alashwal, 2006). The margin corresponds to the distance between the points residing on the two edges of the hyperplane as shown in Figure 8.3. w.x + b = 0 w ⋅ x + b = +1 w⋅ x +b =−1 x2 x1 Figure 8.3 Illustration of the hyperplane separation between the positive and negative example in support vector machine Having found such a plane, the SVM can then predict the classification of an unlabeled example. The formulation of the SVM is described as follows: Suppose our training set S consists of labeled input vectors ( xi , yi ) , i = 1...m where xi ∈ ℜ n and yi ∈ {±1} . We can specify a linear classification rule f by a pair ( w, b) , where the normal vector w ∈ ℜ n and the bias b ∈ ℜ , via Detecting Breast Cancer Using Texture Features and Support Vector Machine f ( x) = ( w, b) + b 167 (9) where a point x is classified as positive if f (x) > 0 . Geometrically, the decision boundary is the hyperplane {x ∈ ℜ n : ( w, x) + b = 0} (10) The idea makes it possible to efficiently deal with vary high dimensional futures spaces is the use of kernels: for all x, z ∈ X ... (11) K ( x, z ) = φ ( x ) ⋅ φ ( z ) where φ is the mapping from X to an inner product feature space. We thus get the following optimization problem: max ∑ λi − m λ i =1 Subject to the constraints λi ≥ 0 1 m ∑ λi λ j yi y j K ( xi , x j ) 2 i , j =1 ∑λ y m i =1 i i =0 (12) (13) Given the labeled feature vectors, we can learn SVM classifier to separate the normal, benign, and malignant classes from each other. The appeal of SVMs is twofold. First they do not require any complex tuning of parameters, and second they exhibit a great ability to generalize given small training samples. They are particularly amenable for learning in high dimensional spaces. In this particular implementation, we used the LibSVM software implemented by Chih-Chung Chang and Chih-Jen Lin. The software is available for download at http://www.csie.ntu.edu.tw/~cjlin/libsvm-tools/. We primarily employed the Gaussian kernel which utilizes the radial basis functions. The Gaussian Radial Basis Function (RBF) is used as it 168 Advances in Artificial Intelligence Applications allows pockets of data to be classified which is more powerful way than just using a linear dot produce. RBF is computed using the following equation: ⎛ || x − x || ⎞ k ( xi , x j ) = exp⎜⎜ − i 2 j ⎟⎟ 2σ ⎠ ⎝ (14) Where σ is the scaling parameter. One of the significant parameters needed to tune our system is the soft-margin parameter or the capacity. The softmargin parameter C allows us to control how much tolerance for error in the classification of training samples we allow (Zaki, Deris, and Alashwal, 2006)]. The soft-margin parameter C and the cross validation values are set to 100 and 10 respectively. 8.6 RESULTS The overall results of classification obtained for the 120 image dataset are summaries in Figure 8.4. Performances of other Machine Learning (ML) techniques such as Linear Discriminant Analysis (LDA), Non-linear Discriminant Analysis (NDA), Principal Component Analysis (PCA) and Artificial Neural Network (ANN) were also shown in Figure 8.4. The results show that, SVM was able to achieve a better accuracy of 82.5%. We used a two-tailed t-test to compare how significant is the difference in accuracies between SVM and other ML techniques. The results are presented below in Table 8.1. They represent the p-values from, 2-tailed t-test. As shown in Table 8.1, there are performance differences between SVM and all the other four methods. Table 8.1, also shows that, the performance of SVM is significantly superior to LDA, PCA and ANN (in bold) at a threshold of 0.05. However, there is no statistically significance Detecting Breast Cancer Using Texture Features and Support Vector Machine 169 difference in performance accuracies between SVM and LDA at a same threshold value. Accuary (%) 100% 80% 40% 82.50% 81.70% 60% 68.30% 63.30% 74.20% 20% 0% LDA NDA PCA ANN SVM Machine Learning Techniques Figure 8.4 Comparison of different ML techniques Table 8.1 Statistical significance differences between pairs of ML techniques NDA PCA ANN SVM LDA 0.1711 0.04910 0.1046 0.1779 NDA 8.7 0.1230 0.0677 0.0070 PCA 0.0557 0.1299 ANN 0.0747 CONCLUSION Texture analysis is a promising tool for clinical decision making and one of the most valuable and promising area in breast tissue analysis. Several factors affect its performance and are still not 170 Advances in Artificial Intelligence Applications completely understood. In this study we analyzed the region of interest on screening mammograms in order to provide radiologist an aid for estimation of tumors. Texture analysis was performed on small ROIs. Five COM measures were calculated from each ROI. The use of the statistical textural features in conjunction with SVM delivered more accurate results of 82.5%. In conclusion we can suggest that texture analysis can contribute to computer aided diagnosis of breast cancer. Completion of the proposed method should include a larger dataset and investigation of additional classification schemes. 8.8 REFERENCES Arivazhagan, S. and Ganesan, L. (2003). Textural classification using wavelet transform, Pattern Recognition Letters . pp. 1513--21. Cristianini, N., Shawe, J. and Taylor. (2000). An introduction to Support Vector Machines, Cambridge, UK: Cambridge University Press. Haralick, R. M., Shanmugam,K. and Dinstein, I. (1973).Textural features for image classification, IEEE Trans Sys Man Cyb. , pp. 610--21. Kwok, J. T., Zhu, H. and Wang, Y. (2003). Texture classification using the support vector machines, Pattern Recognition. pp. 2883--93. Lisboa, P. G. (2002). A review of evidence of health benefit form artificial neural networks, Neural Networks. pp. 11--39. Mavroforakis, M. E., Georgiou, H. V., Dimitropoulos, N., Cavouras, D. and Theodoridis, S. (2006). Mammographic masses characterization based on localized texture and dataset fractal analysis using linear, neural and support vector machine classifiers, Artificial Intelligence in Medicine. pp. 145--162 Detecting Breast Cancer Using Texture Features and Support Vector Machine 171 Smith, R. A. (1995). Screening women aged 40--49: where are we today? Journal of the National Cancer Institute. pp. 11981199. Tabar, L., Fagerber, G. and Chen, R. H. (1995). Efficacy of breast screening by age: new results from the Swedish two country trial, Cancer.pp. 1412--1419. Vapnik, V. N. (1998). Statistical Learning Theory, John Wiley and Son. New York. Zaki, N. M., Deris, S. and Alashwal, H. (2006). Protein Protein Interaction Detection Based on Substring Sensitivity Measure, International Journal of Biomedical Sciences. pp. 148—154. INDEX A activation function, 13, 31 adaptive learning, 40, 45, 61 aircraft, 36, 48, 51 ANFIS, 63, 64, 66, 67, 70, 71, 75, 76, 77, 78, 79 ARIMA, 20, 21, 22, 27, 28, 29, 30, 31, 32, 33, 34, 35, 80, 81 autoregressive, 20, 26, 66 B back‐propagation, 2, 31, 65, 67, 74 Backpropagation, 15, 35, 60 black‐box, 2 BPNN, 2, 3, 6, 11, 14, 15, 17 breast cancer, 159, 160, 170 C characteristics, 1, 2, 3, 4, 5, 7, 11, 13, 17, 21, 23, 26, 66, 67, 72, 92 classification, 1, 2, 3, 14, 15, 17, 20, 37, 51, 53, 58, 60, 63, 91, 95, 103, 106, 107, 117, 128, 133, 141, 148, 150, 160, 162, 165, 166, 168, 170 classification accuracy, 14, 15 clustering, 71, 83, 84, 86, 87, 88, 89, 90, 91, 93, 94, 95, 96, 98, 99, 100, 101, 131 conjugate gradient, 42, 43, 44, 45, 46 content‐based, 135, 156, 157 crisp inputs, 68, 69 D Diagnostic checking, 23 E e‐learning, 7 empirical results, 34 epoch, 40, 43, 46, 54, 55, 96, 97 Error, 14, 15, 56 F feature extraction, 49, 134, 135, 136, 138, 155, 163 Felder Silverman, 2, 3, 4, 5, 6 forecasting, 20, 21, 22, 24, 29, 31, 32, 33, 34, 63, 64, 71, 72, 73, 75, 76, 78, 79, 80 fuzzy decision tree, 106, 107 fuzzy inference, 64, 66, 68, 78, 80 G Gaussian, 69, 75, 76, 77, 85, 134, 167 genetic algorithm, 82, 83, 84, 91, 92, 93, 94, 96, 99 Genetic algorithm Genetic Algorithm, 91, 92 Geometric Moment, 47, 48 grid computing, 102 Grid computing, 102 Index H Hessian matrix, 42, 43, 59 heuristic, 39, 40, 60 hidden, 11, 13, 15, 23, 24, 25, 26, 30, 36, 38, 39, 52, 65, 74, 75 human cognition, 23 hybrid, 17, 21, 26, 27, 28, 31, 33, 34, 67, 68, 71, 83 hyperplane, 166, 167 173 N Neural, 1, 6, 14, 15, 16, 17, 18, 19, 20, 24, 34, 35, 60, 61, 81, 100, 101, 156, 168, 170 neural network, 4, 13, 20, 23, 30, 31, 62, 63, 65, 66, 73, 78, 79, 80, 82, 83, 84, 93, 101 nonlinear mapping, 26, 30, 66 normalization, 29, 73, 93 O I input neurons, 15, 30, 31, 73 intra‐class, 49 optimization problem, 41, 130, 167 output neurons, 15 P J Jacobian matrix, 43, 59 L lead time, 32 learning dimension, 5, 6, 8, 9, 12, 13, 17 learning resources, 2, 8, 11 learning style, 2, 3, 4, 7, 9, 10, 11, 14, 15, 17 local minimum, 37, 45 M Machine Learning, 95, 100, 156, 158, 168 Mamdani, 68 mammograms, 159, 160, 163, 170 mean absolute error, 72, 79 mean square error, 29, 37 MSE error, 52, 54, 55 Pattern, 61, 99, 157, 158, 170 pattern recognition, 6, 20, 36, 63, 82 prediction, 22, 27, 32, 33, 78, 79, 80, 103 Q quasi‐Newton, 41, 42, 43, 46 R Radial Polynomials, 47 Region of Interest, 161 relevance feedback, 126, 127, 128, 129, 130, 131, 132, 133, 141, 146, 149, 151, 152, 153, 154, 155, 156, 157 representative image, 131, 133, 142, 143, 144, 146, 153 rough set, 83, 84, 86, 88, 89, 90, 91, 93, 95, 98, 99 174 Advances in Artificial Intelligence Applications S SOM, 82, 83, 84, 85, 86, 93, 94, 95, 96, 97, 98, 99, 101 Spatial, 163, 164 spatial layout, 126, 155 stochastic, 37 Sugeno‐Takagi, 68 supervised learning, 2, 61, 71 SVM, 128, 141, 148, 150, 157, 159, 163, 165, 166, 167, 168, 169, 170 synaptic, 37 T texture features, 135, 136, 137, 142, 159, 162, 164 two‐tailed t‐test, 168 U U‐matrix, 86 uncertainty, 83, 93, 96, 98, 99 V vagueness, 116 W wavelet filter, 136 Z Zernike Moment, 36, 47, 50