Evolutionary Immune inspired solutions(AIS) for Data Analysis Web Site: www.ijaiem.org Email: ,

advertisement
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 4, April 2013
ISSN 2319 - 4847
Evolutionary Immune inspired solutions(AIS)
for Data Analysis
Bhawani Sankar Biswal1, Anjali Mohaptra2
1
PG Student, Department of Computer Sc.& Engineering, International Institute of Information Technology
(IIIT),Bhubaneswar,Odisha-751003
2
Asst.Professor, Department of Computer Sc.& Engineering, International Institute of Information Technology
ABSTRACT
Immunity- based techniques are gaining popularity in wide area of applications , and emerging as a new branch of
Artificial Intelligence (AI). The paper surveys the major works in the field of data analysis during the last few years, in
particular, reviewed the works of existing methods and the new initiatives.
Keywords: Artificial immune system (AIS), Clonal Selection, Negative Selection, Motif Tracking Algorithm (MTA).
1. INTRODUCTION
The artificial immune system (AIS) implements a learning technique inspired by the human immune system
which is a remarkable natural defence mechanism that learns about foreign substances. However, the immune
system has not attracted the same kind of interest from the computing field as the neural operation of the brain or
the evolutionary forces used in learning classifier systems.
In this paper, we describe an artificial immune system which borrows much of its operation from theories of the
natural immune system to provide a distinguished solution for data analysis.
The remainder of the paper is organized in the following manner: Section 2 introduces to Artificial immune
system(AIS) while Section 3 describes some of the principles of artificial immune system. Section 4 considers how
the AIS can be applied to analysis of sequences and finding unknown MOTIFS in DNA sequences
2. AN INTRODUCTION TO ARTIFICIAL IMMUNE SYSTEM (AIS)
The standard approaches to computational problem-solving tasks such as pattern recognition and information search
have considerably diversified. The variety of fields to which this analysis is relevant has also widely expanded. The
significant increase of computational power, along with our understanding of problem solving paradigms. As a result
computer scientists are no longer restricted by traditional approaches to the development of solution strategies for the
problems. These individuals now look to alternative sources of inspiration for their work, enabling new and varied
solutions to be created that play a valuable role in the problem solving process.
Inspiration for many of the recent techniques applied to computational problem solving has developed from an
increased understanding of the workings of nature itself and in particular biological systems. Many of the biological
systems in nature exhibit desirable properties such as robustness, adaptively and flexibility that would prove extremely
advantageous if they could be extracted and adopted into a computational solution. Inspiration from the workings of the
brain and the nervous system led to the development of neural networks. These used machine based learning to
recognize and classify previously unseen information in a fast and flexible manner. The behavior and interaction of
large groups of ants and bees led to the development of swarm systems. Mechanisms involved in the genetic
recombination of DNA inspired the mutation and crossover operators used in genetic algorithms. Awareness of the
composition and manipulation of DNA also influenced the development of genetic programming techniques [1].
Now a days the natural immune system has become a focus of inspiration for computer scientists, though it exhibits a
wealth of properties that are desirable from a computational perspective. Such properties include robustness,
adaptability, diversity, scalability, and multiple interactions on a variety of time scales [2]. Computer scientists and
engineers aim to adopt these properties in their systems in order to improve on their current solutions or enable them to
solve ever more complex problems.
An algorithm that follows the properties of biological immune behavior is called as Artificial Immune System(AIS).
Timmis and de Castro provide a comprehensive definition of AISs as “adaptive systems, inspired by theoretical
immunology and observed immune function, principle and models, which are applied to problem solving” [3]. Work on
AISs began in the mid to late 1980’s with the development of the theoretical immune network models of Farmer [4],
Perelson [5] and later by Chowdury [6]. A number of AIS models exist, and they are used in pattern recognition, fault
Volume 2, Issue 4, April 2013
Page 292
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 4, April 2013
ISSN 2319 - 4847
detection, computer security, and a variety of other applications researchers are exploring in the field of science and
engineering.
3. POPULAR TECHNIQUES/PRINCIPLES USED IN AIS
3.1 CLONAL SELECTION PRINCIPLE
In addition to immune network theory, other areas of research came to prominence in the AIS field. The clonal
selection theories of Burnet [7] provided inspiration for algorithms such as CLONALG [8], which focused on adaptive
learning and memory. The theories regarding negative selection in the immune system inspired negative selection
algorithms capable of solving anomaly detection and classification tasks. Subsequent criticism of negative selection led
to the development of danger theory [9] to explain the triggers for an immune response. This theory inspired
algorithms such as the Dentritic Cell Algorithm [10] which has achieved success with data classification tasks and
noise filtering.
Figure 1 clonal selection
Volume 2, Issue 4, April 2013
Page 293
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 4, April 2013
ISSN 2319 - 4847
3.2 NEGATIVE SELECTION APPROACH
negative selection algorithm Forrest et al. [12], is one of the computational models of self/nonself discrimination, first
designed as a change detection method. It is one of the earliest AIS algorithms that were applied in various real-world
applications. Since it was first conceived, it has attracted many AIS researchers and practitioners and has gone through
some phenomenal evolution In spite of evolution and diversification of this method.
Figure 2 Negative Selection
AN ILLUSTRATION FOR NEGATIVE SELECTION
Volume 2, Issue 4, April 2013
Page 294
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 4, April 2013
ISSN 2319 - 4847
The inspiration from immune networks, clonal selection, negative selection and danger theory have led to a wealth of
AIS algorithms to tackle a diverse range of problems. In their review Timmis and Hart [11] list the following as known
applications for AIS algorithms: clustering, classification, anomaly detection, computer security, optimization,
scheduling, learning, bio-informatics, image processing, robotic control and navigation, adaptive control systems, virus
detection, web mining and pattern matching.
The attraction of the immune system and the diversity of the applications to which AISs have been applied indicate that
immune inspired solutions offer a wealth of opportunity for computer scientists and engineers. Studying the theories
associated with the development of naive immune cells through to the evolution of immunological memory gave
valuable insight into the way the immune system learns to recognize and remember information presented to it.
Theories regarding immune memory development and maintenance were of prime importance during this research.
Without adequate memory the information gained during the learning process is wasted. By doing investigation and
extracting the principles from immune memory theory we were able to develop three AIS algorithms that were capable
of addressing our data analysis application.
4. AN AIS APPLIED TO DATA ANALYSIS
In today’s environment there is a fundamental need for effective data analysis tools. The wide spread computerization
of most information systems has led to vast amounts of information being generated . The sheer magnitude and scale of
much of this information makes any kind of knowledge extraction difficult for a human being. Computerized
techniques are required in order to analyse and filter the data into a meaningful form that can be understood by users.
Human beings understand and interpret large volumes of data based on the patterns prevalent in that data. Such
patterns are then used to predict future behavior, identify similarities and relationships between datasets, label
unobserved data samples, or assess the likelihood of particular events occurring.
Such repeating patterns are commonly known as ‘MOTIFS’ and effective methods of identifying these motifs are of
significant interest and value.
When the structures of motifs are known in advance we have some methods to search the patterns. Such systems are
common in areas such as weather forecasting and stock market analysis. Given a query pattern of interest, or some
characteristics that define that pattern, a search is performed to find other sequences that are similar to that query. In
these cases the algorithm knows what it is looking for. In contrast, significantly less research has been performed on
the search for ‘unknown motifs’.
Unknown motif detection requires a search be performed when there is no prior knowledge of what motifs exist in the
data. The search is conducted with no preconceptions of what patterns are to be found. As a result the search for
unknown motifs is a considerably harder problem. The system flexibility capable of unknown motif finding has
considerable value. Known motif detection only generates matches to a user specified query. If the user has no
knowledge of what to look for, known motif detection methods are ineffective. This is not a problem for algorithms
capable of finding unknown motifs. Furthermore, the unknown motifs discovered can be used as query motifs in other
known motif searches. Alternatively they are ideal seeds for clustering algorithms and can theoretically be used as part
of a forecasting tool.
Detecting unknown motifs is a complex problem. One potential solution to identify unknown motifs is to run an
exhaustive search over the Time Series. This would compare all subsequences of all possible lengths using a given
distance measure as a similarity threshold. This theoretically full proof solution is im- practical given the magnitude of
modern data sets. A more effective solution is required. However, it is useful to use as a benchmark to ensure other
solutions do find all possible motifs.
Fundamental to the success of the system is the way in which immune cells evolve to recognize, match and remember
information that is presented to it. This allows the immune system to identify threats and react accordingly. Of critical
importance is that the immune system is able to achieve this without prior knowledge of what threats it is likely to
encounter. This recognition process is encapsulated by the development of immune cells from a naive state all the way
through to the formation of immune memory. The development of immune memory is critical to the success of immune
system in this recognition task.
Memory cells represent solutions that the immune system has generated and wishes to remember from the experiences
it has encountered. Memory develop- ment reflects a process capable of finding and storing a solution to a data analysis
problem. By understanding the evolutionary process that leads to immune memory an understanding of how the
immune system learns to adapt and solve pattern matching tasks is achieved.
The immune system must adapt to recognize new information with no prior awareness of what to look for. It must then
retain knowledge of that information so that recurrences of that data can be identified quickly and responded to. A
direct synergy can be seen between this inspiration and the objective of unknown motif detection.
For these reasons the developmental processes leading to the evolution of immune memory provide the inspiration for
much of the work. As such the Motif Tracking Algorithm (MTA)[13] provides a genuinely unique solution to a
difficult problem area that has received very little attention.
Volume 2, Issue 4, April 2013
Page 295
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 4, April 2013
ISSN 2319 - 4847
Considerable research has already been performed on identifying known patterns in time series [14]. In contrast little
research has been performed on looking for unknown motifs in time series. This provides an ideal opportunity for an
AIS driven approach to tackle the problem of motif detection, as a distinguishing feature of the MTA is its ability to
identify variable length unknown patterns that repeat in a time series using an evolutionary system. In many data sets
there is no prior knowledge of what patterns exist so traditional detection techniques are unsuitable. In this paper we
test the generic properties of the MTA by applying it to motif identification in two industrial data sets to asses its ability
to find variable length unknown motifs.
MTA Pseudo Code
Initiate MTA (s, r, a)
Convert Time series T to symbolic representation
Generate Symbol Matrix S
Initialise Tracker population to size a
While ( Tracker population > 0 )
{
Generate motif candidate matrix M from S
Match trackers to motif candidates
Eliminate unmatched trackers
Examine T to confirm genuine motif status
Eliminate unsuccessful trackers
Store motifs found
Proliferate matched trackers
Mutate matched trackers
}
Memory motif streamlining
5. CONCLUSION
Often, many AIS solutions incorporate significant complexity in their approach in order to mimic the biological system.
In doing so, it is hoped that some of the desirable properties of the immune system are carried over into the
computational solution.
Motif and patterns are key tools for use in data analysis.By extracting motifs that exist in data we understand the nature
and characteristic of data.The motif provides an obvious mechanism to cluster,classify and summerise data, placing
great value on these patterns While most research has focused on the search for known motifs, little research has been
performed looking for variable length unknown motifs in time series.
References
[1] W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Pro- gramming, An Introduction: on the
Automatic Evolution of Computer Programs and its Applications. Morgan Kaufman, 1998R. Caves, Multinational
Enterprise and Economic Analysis, Cambridge University Press, Cambridge, 1982. (book style)
[2] D. R. Flower and J. Timmis. In Silico Immunology. Springer, 2007.
[3] H. L. N. de Castro and J. Timmis. Artificial Immune Systems: A New Com- putational Intelligence Approach.
Springer, 2002.
[4] J. D. Farmer, N. H. Packard, and A. S. Perelson. The Immune System, Adaptation and Machine Learning.
Physica D, 22:187–204, 1986.
[5] A. S. Perelson. Immune Network Theory. Immunological Review, 110: 5–36, 1989.
[6] D. Chowdury. Immune Network: An Example of Complex Adaptive Sys- tems, 1999. In
Artificial
Immune Systems and Their Applications, D. Dasgupta (Ed.) pp. 89-104, Springer
[7] F. M. Burnet. The Clonal Selection Theory of Acquired Immunity. Cam- bridge University Press, 1959.
[8] L. N. de Castro and F. J. Von Zuben. The Clonal Selection Algorithm with Engineering Applications. In
Workshop on Artificial Immune Systems and Their Applications GECCO, Las Vegas, USA, pages 36–37, 2000.
[9] . P. Matzinger. Tolerance, Danger and the Extended Family. Annual Re- views in Immunology, 12:991–1045,
1994.
[10] J. Greensmith, U. Aickelin, and S. Cayzer. Introducing Dendritic Cells as a Novel Immune-Inspired Algorithm for
Anomaly Detection. In ICARIS-05, LNCS 3627, pages 153–167, 2005.
[11] . E. Hart and J. Timmis. Application Areas of AIS: The Past, the Present and the Future. In ICARIS-05, LNCS
3627, pages 483–497, 2005.
Volume 2, Issue 4, April 2013
Page 296
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 4, April 2013
ISSN 2319 - 4847
[12] . S. Forrest, A.S. Perelson, L. Allen, and R. Cherukuri, “Self-Nonself Discrimination in a Computer,” In
Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy, Los Alamitos, CA: IEEE
Computer Society Press, 1994.
[13] William Wilson and Phil Birkin and Uwe Aickelin" Motif Detection Inspired by Immune Memory" School of
Computer Science, University of Nottingham, UK,2010.
[14] J. Lin, E. Keogh, S. Lonardi, and P. Patel. Finding motifs in time series. In the2nd workshop on temporal data
mining, at the 8th ACM SIGKDD internationalconference on knowledge discovery and data mining, July, 2002.
AUTHOR
Bhawani Sankar Biswal has received his B.E degree in Computer Sc.&Engineering from Biju patnaik
university of technology,Odisha, in 2006.He is was serving as a lecturer at Nigam Institute of Engineering
& Technology(NIET) for a period of two yearsHe is currently pursuing his Master of Engineering in
Computer Sc.&Engineering International Institute of Information Technology(IIIT),Bhubaneswar,Odisha.
His areas of interest in research are Bioinformatics, Algorithms,Wireless Communication & Wireless Networks.
Dr.Anjali Mohapatra has received her Ph,D & M.Tech degree in Computer Sc.&Engineering from
Utkal University,Bhubaneswar,Odisha,.She is currently working as an Asst,Professer in the deptt. of CSE
in International Institute of Information Technology(IIIT),Bhubaneswar,Odisha. Her areas of interest in
research are Bioinformatics, Algorithms, Data Minning etc.
Volume 2, Issue 4, April 2013
Page 297
Download