Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 4, April 2013 ISSN 2319 - 4847 Evolutionary Immune inspired solutions(AIS) for Data Analysis Bhawani Sankar Biswal1, Anjali Mohaptra2 1 PG Student, Department of Computer Sc.& Engineering, International Institute of Information Technology (IIIT),Bhubaneswar,Odisha-751003 2 Asst.Professor, Department of Computer Sc.& Engineering, International Institute of Information Technology ABSTRACT Immunity- based techniques are gaining popularity in wide area of applications , and emerging as a new branch of Artificial Intelligence (AI). The paper surveys the major works in the field of data analysis during the last few years, in particular, reviewed the works of existing methods and the new initiatives. Keywords: Artificial immune system (AIS), Clonal Selection, Negative Selection, Motif Tracking Algorithm (MTA). 1. INTRODUCTION The artificial immune system (AIS) implements a learning technique inspired by the human immune system which is a remarkable natural defence mechanism that learns about foreign substances. However, the immune system has not attracted the same kind of interest from the computing field as the neural operation of the brain or the evolutionary forces used in learning classifier systems. In this paper, we describe an artificial immune system which borrows much of its operation from theories of the natural immune system to provide a distinguished solution for data analysis. The remainder of the paper is organized in the following manner: Section 2 introduces to Artificial immune system(AIS) while Section 3 describes some of the principles of artificial immune system. Section 4 considers how the AIS can be applied to analysis of sequences and finding unknown MOTIFS in DNA sequences 2. AN INTRODUCTION TO ARTIFICIAL IMMUNE SYSTEM (AIS) The standard approaches to computational problem-solving tasks such as pattern recognition and information search have considerably diversified. The variety of fields to which this analysis is relevant has also widely expanded. The significant increase of computational power, along with our understanding of problem solving paradigms. As a result computer scientists are no longer restricted by traditional approaches to the development of solution strategies for the problems. These individuals now look to alternative sources of inspiration for their work, enabling new and varied solutions to be created that play a valuable role in the problem solving process. Inspiration for many of the recent techniques applied to computational problem solving has developed from an increased understanding of the workings of nature itself and in particular biological systems. Many of the biological systems in nature exhibit desirable properties such as robustness, adaptively and flexibility that would prove extremely advantageous if they could be extracted and adopted into a computational solution. Inspiration from the workings of the brain and the nervous system led to the development of neural networks. These used machine based learning to recognize and classify previously unseen information in a fast and flexible manner. The behavior and interaction of large groups of ants and bees led to the development of swarm systems. Mechanisms involved in the genetic recombination of DNA inspired the mutation and crossover operators used in genetic algorithms. Awareness of the composition and manipulation of DNA also influenced the development of genetic programming techniques [1]. Now a days the natural immune system has become a focus of inspiration for computer scientists, though it exhibits a wealth of properties that are desirable from a computational perspective. Such properties include robustness, adaptability, diversity, scalability, and multiple interactions on a variety of time scales [2]. Computer scientists and engineers aim to adopt these properties in their systems in order to improve on their current solutions or enable them to solve ever more complex problems. An algorithm that follows the properties of biological immune behavior is called as Artificial Immune System(AIS). Timmis and de Castro provide a comprehensive definition of AISs as “adaptive systems, inspired by theoretical immunology and observed immune function, principle and models, which are applied to problem solving” [3]. Work on AISs began in the mid to late 1980’s with the development of the theoretical immune network models of Farmer [4], Perelson [5] and later by Chowdury [6]. A number of AIS models exist, and they are used in pattern recognition, fault Volume 2, Issue 4, April 2013 Page 292 Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 4, April 2013 ISSN 2319 - 4847 detection, computer security, and a variety of other applications researchers are exploring in the field of science and engineering. 3. POPULAR TECHNIQUES/PRINCIPLES USED IN AIS 3.1 CLONAL SELECTION PRINCIPLE In addition to immune network theory, other areas of research came to prominence in the AIS field. The clonal selection theories of Burnet [7] provided inspiration for algorithms such as CLONALG [8], which focused on adaptive learning and memory. The theories regarding negative selection in the immune system inspired negative selection algorithms capable of solving anomaly detection and classification tasks. Subsequent criticism of negative selection led to the development of danger theory [9] to explain the triggers for an immune response. This theory inspired algorithms such as the Dentritic Cell Algorithm [10] which has achieved success with data classification tasks and noise filtering. Figure 1 clonal selection Volume 2, Issue 4, April 2013 Page 293 Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 4, April 2013 ISSN 2319 - 4847 3.2 NEGATIVE SELECTION APPROACH negative selection algorithm Forrest et al. [12], is one of the computational models of self/nonself discrimination, first designed as a change detection method. It is one of the earliest AIS algorithms that were applied in various real-world applications. Since it was first conceived, it has attracted many AIS researchers and practitioners and has gone through some phenomenal evolution In spite of evolution and diversification of this method. Figure 2 Negative Selection AN ILLUSTRATION FOR NEGATIVE SELECTION Volume 2, Issue 4, April 2013 Page 294 Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 4, April 2013 ISSN 2319 - 4847 The inspiration from immune networks, clonal selection, negative selection and danger theory have led to a wealth of AIS algorithms to tackle a diverse range of problems. In their review Timmis and Hart [11] list the following as known applications for AIS algorithms: clustering, classification, anomaly detection, computer security, optimization, scheduling, learning, bio-informatics, image processing, robotic control and navigation, adaptive control systems, virus detection, web mining and pattern matching. The attraction of the immune system and the diversity of the applications to which AISs have been applied indicate that immune inspired solutions offer a wealth of opportunity for computer scientists and engineers. Studying the theories associated with the development of naive immune cells through to the evolution of immunological memory gave valuable insight into the way the immune system learns to recognize and remember information presented to it. Theories regarding immune memory development and maintenance were of prime importance during this research. Without adequate memory the information gained during the learning process is wasted. By doing investigation and extracting the principles from immune memory theory we were able to develop three AIS algorithms that were capable of addressing our data analysis application. 4. AN AIS APPLIED TO DATA ANALYSIS In today’s environment there is a fundamental need for effective data analysis tools. The wide spread computerization of most information systems has led to vast amounts of information being generated . The sheer magnitude and scale of much of this information makes any kind of knowledge extraction difficult for a human being. Computerized techniques are required in order to analyse and filter the data into a meaningful form that can be understood by users. Human beings understand and interpret large volumes of data based on the patterns prevalent in that data. Such patterns are then used to predict future behavior, identify similarities and relationships between datasets, label unobserved data samples, or assess the likelihood of particular events occurring. Such repeating patterns are commonly known as ‘MOTIFS’ and effective methods of identifying these motifs are of significant interest and value. When the structures of motifs are known in advance we have some methods to search the patterns. Such systems are common in areas such as weather forecasting and stock market analysis. Given a query pattern of interest, or some characteristics that define that pattern, a search is performed to find other sequences that are similar to that query. In these cases the algorithm knows what it is looking for. In contrast, significantly less research has been performed on the search for ‘unknown motifs’. Unknown motif detection requires a search be performed when there is no prior knowledge of what motifs exist in the data. The search is conducted with no preconceptions of what patterns are to be found. As a result the search for unknown motifs is a considerably harder problem. The system flexibility capable of unknown motif finding has considerable value. Known motif detection only generates matches to a user specified query. If the user has no knowledge of what to look for, known motif detection methods are ineffective. This is not a problem for algorithms capable of finding unknown motifs. Furthermore, the unknown motifs discovered can be used as query motifs in other known motif searches. Alternatively they are ideal seeds for clustering algorithms and can theoretically be used as part of a forecasting tool. Detecting unknown motifs is a complex problem. One potential solution to identify unknown motifs is to run an exhaustive search over the Time Series. This would compare all subsequences of all possible lengths using a given distance measure as a similarity threshold. This theoretically full proof solution is im- practical given the magnitude of modern data sets. A more effective solution is required. However, it is useful to use as a benchmark to ensure other solutions do find all possible motifs. Fundamental to the success of the system is the way in which immune cells evolve to recognize, match and remember information that is presented to it. This allows the immune system to identify threats and react accordingly. Of critical importance is that the immune system is able to achieve this without prior knowledge of what threats it is likely to encounter. This recognition process is encapsulated by the development of immune cells from a naive state all the way through to the formation of immune memory. The development of immune memory is critical to the success of immune system in this recognition task. Memory cells represent solutions that the immune system has generated and wishes to remember from the experiences it has encountered. Memory develop- ment reflects a process capable of finding and storing a solution to a data analysis problem. By understanding the evolutionary process that leads to immune memory an understanding of how the immune system learns to adapt and solve pattern matching tasks is achieved. The immune system must adapt to recognize new information with no prior awareness of what to look for. It must then retain knowledge of that information so that recurrences of that data can be identified quickly and responded to. A direct synergy can be seen between this inspiration and the objective of unknown motif detection. For these reasons the developmental processes leading to the evolution of immune memory provide the inspiration for much of the work. As such the Motif Tracking Algorithm (MTA)[13] provides a genuinely unique solution to a difficult problem area that has received very little attention. Volume 2, Issue 4, April 2013 Page 295 Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 4, April 2013 ISSN 2319 - 4847 Considerable research has already been performed on identifying known patterns in time series [14]. In contrast little research has been performed on looking for unknown motifs in time series. This provides an ideal opportunity for an AIS driven approach to tackle the problem of motif detection, as a distinguishing feature of the MTA is its ability to identify variable length unknown patterns that repeat in a time series using an evolutionary system. In many data sets there is no prior knowledge of what patterns exist so traditional detection techniques are unsuitable. In this paper we test the generic properties of the MTA by applying it to motif identification in two industrial data sets to asses its ability to find variable length unknown motifs. MTA Pseudo Code Initiate MTA (s, r, a) Convert Time series T to symbolic representation Generate Symbol Matrix S Initialise Tracker population to size a While ( Tracker population > 0 ) { Generate motif candidate matrix M from S Match trackers to motif candidates Eliminate unmatched trackers Examine T to confirm genuine motif status Eliminate unsuccessful trackers Store motifs found Proliferate matched trackers Mutate matched trackers } Memory motif streamlining 5. CONCLUSION Often, many AIS solutions incorporate significant complexity in their approach in order to mimic the biological system. In doing so, it is hoped that some of the desirable properties of the immune system are carried over into the computational solution. Motif and patterns are key tools for use in data analysis.By extracting motifs that exist in data we understand the nature and characteristic of data.The motif provides an obvious mechanism to cluster,classify and summerise data, placing great value on these patterns While most research has focused on the search for known motifs, little research has been performed looking for variable length unknown motifs in time series. References [1] W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Pro- gramming, An Introduction: on the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufman, 1998R. Caves, Multinational Enterprise and Economic Analysis, Cambridge University Press, Cambridge, 1982. (book style) [2] D. R. Flower and J. Timmis. In Silico Immunology. Springer, 2007. [3] H. L. N. de Castro and J. Timmis. Artificial Immune Systems: A New Com- putational Intelligence Approach. Springer, 2002. [4] J. D. Farmer, N. H. Packard, and A. S. Perelson. The Immune System, Adaptation and Machine Learning. Physica D, 22:187–204, 1986. [5] A. S. Perelson. Immune Network Theory. Immunological Review, 110: 5–36, 1989. [6] D. Chowdury. Immune Network: An Example of Complex Adaptive Sys- tems, 1999. In Artificial Immune Systems and Their Applications, D. Dasgupta (Ed.) pp. 89-104, Springer [7] F. M. Burnet. The Clonal Selection Theory of Acquired Immunity. Cam- bridge University Press, 1959. [8] L. N. de Castro and F. J. Von Zuben. The Clonal Selection Algorithm with Engineering Applications. In Workshop on Artificial Immune Systems and Their Applications GECCO, Las Vegas, USA, pages 36–37, 2000. [9] . P. Matzinger. Tolerance, Danger and the Extended Family. Annual Re- views in Immunology, 12:991–1045, 1994. [10] J. Greensmith, U. Aickelin, and S. Cayzer. Introducing Dendritic Cells as a Novel Immune-Inspired Algorithm for Anomaly Detection. In ICARIS-05, LNCS 3627, pages 153–167, 2005. [11] . E. Hart and J. Timmis. Application Areas of AIS: The Past, the Present and the Future. In ICARIS-05, LNCS 3627, pages 483–497, 2005. Volume 2, Issue 4, April 2013 Page 296 Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 4, April 2013 ISSN 2319 - 4847 [12] . S. Forrest, A.S. Perelson, L. Allen, and R. Cherukuri, “Self-Nonself Discrimination in a Computer,” In Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy, Los Alamitos, CA: IEEE Computer Society Press, 1994. [13] William Wilson and Phil Birkin and Uwe Aickelin" Motif Detection Inspired by Immune Memory" School of Computer Science, University of Nottingham, UK,2010. [14] J. Lin, E. Keogh, S. Lonardi, and P. Patel. Finding motifs in time series. In the2nd workshop on temporal data mining, at the 8th ACM SIGKDD internationalconference on knowledge discovery and data mining, July, 2002. AUTHOR Bhawani Sankar Biswal has received his B.E degree in Computer Sc.&Engineering from Biju patnaik university of technology,Odisha, in 2006.He is was serving as a lecturer at Nigam Institute of Engineering & Technology(NIET) for a period of two yearsHe is currently pursuing his Master of Engineering in Computer Sc.&Engineering International Institute of Information Technology(IIIT),Bhubaneswar,Odisha. His areas of interest in research are Bioinformatics, Algorithms,Wireless Communication & Wireless Networks. Dr.Anjali Mohapatra has received her Ph,D & M.Tech degree in Computer Sc.&Engineering from Utkal University,Bhubaneswar,Odisha,.She is currently working as an Asst,Professer in the deptt. of CSE in International Institute of Information Technology(IIIT),Bhubaneswar,Odisha. Her areas of interest in research are Bioinformatics, Algorithms, Data Minning etc. Volume 2, Issue 4, April 2013 Page 297