Immunocomputing: a survey I.Antoniou, S.Gutnikov, V.Ivanov, Yu.Melnikov, A.Tarakanov International Solvay Institutes for Physics and Chemistry, Campus Plaine ULB, CP231, Bd.du Triomphe, Brussels 1050, Belgium The University of Oxford, Department of Biochemistry, South Park Road, Oxford OX1 3QU, United Kingdom Amtel Systems Overseas Ltd., PO Box 307, Circular Road 19/21, Douglas IM99 2BE, United Kingdom Abstract The recently appeared notion of the immunocomputing is currently under implementation in the frame of the EU project IMCOMP. The aim of this project is to create a new kind of computational paradigm based on some principles of information processing by proteins and immune networks in the living nature. This paradigm will be used for solving specific complex problems and protection from computer viruses, intruder attacks, noise, and random errors. The implementation of this immunocomputing paradigm will lead to development of a new kind of computer that we propose to call immunocomputer by analogy to the widely spread neurocomputers based on the models of neurons and neural networks. The objective of this review is to compare IMCOMP with the existing approaches in computer science and to highlight its novelty and advantages. 1. Introduction Biological systems, even at the levels of cells and biomolecules, can be regarded as sophisticated information processing systems and can provide inspiration for various ideas in engineering and technology. However, there are only two systems in animals that possess extraordinary capabilities of information processing such as learning and memory, ability to recognize patterns and to make decisions about how to behave in an unfamiliar environment. These two are (1) nervous system and (2) immune system. The animal nervous system has been already intensively used in computer science as a biological prototype for mathematical algorithms of artificial neural networks (ANN). Software, based on ANN, has been created and found its hardware implementation in neural computers [20,27]. 1 However, the extraordinary information processing capabilities of the natural immune system has been appreciated only recently. The aim of the IMCOMP project is to create mathematical algorithms based on the principles of functioning of the natural immune systems and to develop software and hardware implementation of these algorithms. In this paper we present the overview of the IMCOMP project. An introduction to the main principles of functioning of the natural immune systems is given in Section 2. Section 3 contains a brief review of Artificial Immune Systems (AIS) and their applications, as this is the field of computer science closest to IMCOMP. The basic elements and main functional principles of immunocomputing are described in Section 4. These include mathematical background (4.1), principles of information processing and their application for the solution of some data processing problems (4.2), and a brief description of a prototype of the immuno-chip: the basic element of the future immunocomputer (4.3). Finally, in Section 5 the main innovations and objectives of the IMCOMP project are discussed. 2. Overview of the natural immune system The word immunity (from Latin immunitas) means "freedom from". The main purpose of the immune system is to keep the organism free from unfriendly foreign organisms, cells, or molecules (collectively called pathogens). The organism’s defense against intrusion is multilayered. First there are mechanical barriers: skin, mucus of the respiratory, digestive and urogenital tracts, tears, etc. The second barrier is environmental: excreted body fluids (sweat, saliva, tears, etc.) have physical and chemical characteristics that provide inappropriate living conditions for many pathogens. Pathogens that managed to pass the first lines of defense and enter the body are handled by the immune system Non-specific innate defence mechanisms exist; this innate immunity is primarily maintained by circulating scavenger cells such as macrophages that ingest extracellular molecules and materials, clearing the system of both pathogens and debris. For the most efficient protection, the specific acquired immunity based on recognition and selective targeting "non-self" patterns has evolved. Acquired immunity is based on a sophisticated physiological mechanism that involves many different types of cells and molecules. It is also called adaptive because it is responsible for immunity that is adaptively acquired during the lifetime of the organism. An important part of adaptive immunity is the ability of the immune system to "memorise" encountered pathogens and to produce enhanced response in the case of repeated intrusion of the same or similar pathogen. Parts of the pathogen that are recognised by the immune system are called antigens. A single pathogen, e.g. bacteria, may contain a large number of different antigens. The adaptive immune system can be viewed as a distributed detection system, which consists primarily of white blood cells, called lymphocytes that circulate through the body in the blood and lymph. Detection, or recognition, occurs when molecular bonds are formed between antigens and receptors that cover the surface of the lymphocyte. When antigen is detected, a mechanism is triggered that causes proliferation of cells producing antibodies capable of selective binding to that particular antigen. When an antigen is bound with the antibody, its carrier pathogen becomes a target for destruction by macrophages. 2 Both antigens and cell receptors are molecules of protein nature. The immune system's pattern recognition mechanism must be highly effective: it can distinguish about 105 "self" proteins from more those 1016 "non-self" ones. This powerful recognition mechanism is a property of the immune system as a whole, not that of a single lymphocyte. Each lymphocyte has on its surface receptors of only one type and hence it can recognise only one antigen. The ability to detect most pathogens requires a huge diversity of lymphocyte receptors, which is achieved by generating lymphocyte receptors through genetic process that provides a huge amount of randomness. When in this random process a lymphocyte with receptors to a "self" protein is created, that lymphocyte is eliminated before it matures. Thus, only lymphocytes with receptors to "non-self" are released into circulation. In this respect, lymphocytes can be viewed as negative detectors, because they detect only “non-self “ patterns, and ignore “self” patterns. Even though receptors are randomly generated, there are not enough lymphocytes in the body to provide a complete coverage of the space of all possible antigens: one estimate is that there are some 108 different lymphocyte receptors in the body at any given time, while the potential number of antigens is in the order of 1016. Immune protection is a probabilistic process. First, pathogens usually have several different antigens, so there is a chance that at least some of them will be recognised and that is sufficient for triggering immune response. Second, protection is made dynamic by continual circulation of lymphocytes through the body, and by continual turnover of the lymphocyte population. Lymphocytes are typically short-lived (several days) and are continually replaced with new lymphocytes that have new randomly generated receptors. Finally, if by misfortune the immune system of a single organism fails to recognise and resist infection, there is sufficient probability that other organisms in the population will have appropriate detectors at the time of infection and this is sufficient for survival of the species. The immune learning and memory achieve a more efficient protection against a specific pathogen. If immune system detects an antigen it had not encountered before, it undergoes a primary response, during which it “learns” to recognise that specific antigen more effectively, i.e. it produces a large number of lymphocytes with high affinity for that antigen, through a process called affinity maturation. These so called memory cells remain in circulation and provide faster detection and elimination of the pathogen at the next encounter. Summary. The natural immune system has many features that are desirable from a computer science standpoint. The system is massively parallel and its functioning is truly distributed. Individual components are disposable and unreliable, yet the system as a whole is robust. Previously encountered infections are detected and eliminated quickly, while novel intrusions are detected on a slower time scale, using a variety of adaptive mechanisms. The system is autonomous, controlling its own behaviour both at the detector and effector levels. Individual organism's immune systems detect infections in slightly different ways, so pathogens that are able to evade the defences of one organism cannot necessarily evade those of every other organism in population. 3 3. Artificial immune systems The most close to IMCOMP is the field of Artificial Immune Systems (AIS). The formation of this field could be seen as completed in 1999 when the fist book on the question has been issued [2]. AIS represent the new and rapidly growing field of computer science. AIS are expected to give rise to powerful and robust information processing capabilities for solving complex problems. Like ANN, AIS can learn new information, recall previously learned information and perform pattern recognition in a highly decentralized fashion. AIS have already been applied in: – detection of faults in manufacturing – security of information – design of vaccines – control of autonomous mobile robots – mining of commercial data – monitoring of plague foci in Central Asia. 3.1. Immune Network Model Of special interest is the widespread theory of immune networks, formed from the interactions as well as between antibodies and immune cells. Niels Jerne, who worked in the Institute Pasteur of Paris, proposed in 1973 the general theory of idiotypic networks, also called as immune networks [18]. These theories is based on the concept that immune cells (lymphocytes) are not isolated, but communicate with each other among different species of lymphocytes through interaction among antibodies. Accordingly, the identification of antigens is not done by a single recognizing set but rather a system level recognition of the sets connected by antigen-antibody reaction as a network. Nowadays the existence of the immune networks is established beyond all doubts. Their fragments and interactions have been detected experimentally. It is worth to note that similar networks under the name molecular circuits have been even proposed as a possible molecular basis of neuronal memory in the human brain. Jerne's immune network theory received a lot of attention among the researchers over the last two decades and many computational aspects of this model are derived for practical use. From the mathematical viewpoint namely N.Jerne initiated the development of a rigorous framework to modelling immune system. His theory is modelled with differential equations, which simulate the dynamics of lymphocytes. Based on Jerne's work, Perelson [22] presented a probabilistic approach to idiotypic networks. His approach is very mathematical, discussing more about phase transition in idiotype networks. 4 3.2. Negative Selection Algorithm Forrest et. al. [12] developed a negative-selection algorithm for change detection based on the principles of self-nonself discrimination in the immune system. This approach can be summarised as follows: 1. Define self as a collection S of strings of length l over a finite alphabet, a collection that needs to be protected or monitor. For example, S may be normal pattern (program, data file) of activity, which is segmented into equal-sized substrings. 2. Generate a set R of detectors, each of which fails to match any string in S. Instead of exact or perfect matching, the method uses a partial matching rule, in which two strings match if and only if they are identical at least r contiguous positions, where r is a suitable chosen parameter. 3. Monitor S for changes by continually matching the detectors in R against S. If any detector ever matches, then a change is known to have occurred, because the detectors are designed to match any of the original strings in S. The algorithm seems to have many potential applications in change-detection. 3.3. Other Models There exist other computational models [10,11] which emulate different immunological aspects, for example, its ability to detect common patterns in a noisy environment, its ability to discover and maintain coverage of diverse pattern classes, and its ability to learn effectively, even when not all antibodies are expressed and not all antigens are presented. Hoffman has compared the immune system and the nervous system, and has found many similarities at the level of system behaviour. Farmer et al. [10], and Bersini and Varela [3] have compared the immune system with learning classifier systems. Gilbert and Routen [14] experimented with immune network model to create a content-addressable auto-associative memory, specifically for image recognition. 3.4. Some Applications The models based on immune system principles are finding increasing applications in the fields of science and engineering. 3.4.1. Computer Security S.Forrest and her group at the University of New Mexico are working on a research project with a long-term goal to build an artificial immune system for computers. Their computer immune system has to protect a computer against non-authorized use of computer facilities, maintain the integrity of data files, and prevent the spread of computer viruses. Their research program is based on the negative-selection algorithm. 5 3.4.2. Anomaly Detection in time series data Dasgupta and Forrest [6] experimented with several time series data sets (both real and model) to investigate the performance of the negative-selection algorithm for detecting anomaly in the data series. The objective of this work is to develop an efficient algorithm that can be used for noticing any changes in steady-state characteristics of a system or a process. In this case, the notion of self is considered as the normal behaviour patterns of the monitored system. Any deviation that exceeds an allowable variation in the observed data is considered as an anomaly in the behaviour pattern. The results have shown that this approach can be used as a tool for automated monitoring of safety-critical operations. 3.4.3. Fault Diagnosis Ishida [16] studied the mutual recognition feature of the immune network model for fault diagnosis. In his implementation, fault tolerance was attained by mutual recognition of interconnected units in the studied plant. That is, system level recognition was achieved by unit level recognition. The results are very promising and worth further investigation. Ishiguro et al. [17] applied the immune network model to on-line fault diagnosis of plant systems. This work attempts to develop an integrated fault diagnosis method, which can be used in industrial plants. 3.4.4 AIS for Pattern Recognition Hunt and Cooke [1996] investigated an AIS based on the theory of immune network within the context of machine learning. Such a system combines the advantages of learning classifier systems with some of the advantages of neural networks, machine induction and case-based retrieval. They have shown the potential of AIS on a pattern recognition problem, namely the recognition of promoters in DNA sequences. 3.5. Summary AIS are a subject of great research interest because of their powerful information processing capabilities. In particular, they perform many complex computations in a completely parallel and distributed fashion. Like ANN, AIS can learn new information, recall previously learned information and performs pattern recognition tasks in a highly decentralized fashion. Also learning takes place by evolutionary processes similar to evolutionary computations. There are many potential application areas in which immunity-based computational models appear to be very useful. However, a comparison with ANN shows that the field of AIS has not yet: 1. A clear and sound mathematical basis 2. Hardware implementation analogous to the existing neurocomputers that were based on ANN. 6 Nowadays AIS is represented by software tools based on heuristic algorithms, using ideas from genetic algorithms, cellular automata, ANN, etc. Thus, solving the above problems could raise AIS as well as their principal applications (e.g. to information security) on the new level of reliability, flexibility and operating speed. 4. Immunocomputing The natural immune system is based on interaction of proteins. The main goal of the IMCOMP is to implement the principles of information processing by proteins and immune networks in a new kind of computational paradigm in order to solve specific complex problems while protected from viruses, noise, errors and intrusions. We shall demonstrate that our immunocomputing leads to a new kind of computer, we propose to call immunocomputer by analogy to the widely spread neurocomputers, which are based on the models of neurons and neural networks. Three main innovations are expected to emerge from the IMCOMP project: 1. Appropriate mathematical framework (formal immune networks); 2. New approach to information processing (immunocomputing); 3. New hardware (immuno-chips). These are discussed in detail below. 4.1. Appropriate mathematical framework. According to biological prototypes and their mathematical models [23-26], the principal difference between IMCOMP and other types of computations should be determined by functions of their basic elements. For example, if artificial neuron, as a basic element of ANN and neural computing is considered as a summation with a threshold, connected with fixed neurons [27], then protein as a basic element of the IMCOMP ensures quite other conditions [4]: Spatial conformation of protein is determined by the linear sequence (word) of its amino acid’s code; This conformation determines functions of any protein. In fact, there is no mathematical models even approach to these demands. Thus we need to develop a new concept of formal protein (FP) as a mathematical abstraction for key biophysical mechanisms of natural proteins’ behavior. The FP has the same importance for IMCOMP as the well-known concept of artificial (or formal) neuron has for the neural computing. Namely in the frame of interaction between formal proteins, we intend to develop the new concept of Formal Immune Networks (FIN) and demonstrate rigorously, that such networks are able to learn, recognize and solve problems like artificial intelligence systems. The most close to FIN could be considered mathematical models based on the theory of idiotypic networks of N.Jerne. His theory can be modeled also with differential equations, which simulates the dynamics of lymphocytes – the increase or decrease of the concentration of a set of lymphocyte clones and the corresponding immunoglobines. 7 However such approach doesn’t consider the concrete mechanisms of interactions between biomolecules and cells. So it couldn’t form a basis for a new approach to information processing. 4.2. New approach to information processing (immunocomputing). As an information processing approach, IMCOMP gives rise to the following innovations: 1. New methods for pattern recognition and data mining based on the principles of biomolecular recognition (binding energy of proteins); 2. New methods for synchronization of events in computer networks based on biomolecular principles of self-synchronization (biomolecular messengers); 3. New methods for simulating dynamics of natural processes in 3D modeling based on the principles of biomolecular interaction (excitable lattices of biomolecules); 4. New methods for language representation and problem solving based on the theory of linguistic valence (behavior of words is equivalent to the behavior of biomolecules); 4.3. New hardware (immuno-chips). As a new approach to information processing IMCOMP needs hardware implementation in a special type of electronic scheme – immuno-chip. The matter is that architectures of the traditional PC or neuro-computers are not convenient for fast and distributed immuno-computations. Apparently, the most appropriate architecture of the immuno-chip can be developed using the analogy with the architecture of the modern biochips or microarrays [9,21]. 5. Discussion of follow-up The following innovations, are expected to be achieved through immunocomputing as a follow-up of the IMCOMP project: Immunocomputers would be able to overcome the main drawbacks of neurocomputers (spurious patterns, low capacity in relation to the size of neural network, difficulty with location of errors). These drawbacks block the wide application of neurocomputers in fields, where errors cost too much, like control and navigation of spacecrafts, aircrafts, ships, submarines, security systems, intensive care medicine. Immunocomputers could provide an effective simulation of the natural immune system and aspects of relevant diseases, as AIDS. Even the simplest variants of formal immune networks effectively simulate important properties of immune response and immune memory. We expect that in the field of diagnostics (fault detection) for spaceships, aircrafts, nuclear power plants and ecology it will be possible to: - deal with huge amounts of data in hard time constraints; 8 - detect early and reliably critical situations, errors and faults; - overcome neurocomputing difficulties. In the field of information security for computer networks, the development of: - self-learning security systems to resist unknown invaders (viruses, unauthorized users); - software/hardware implementation of security systems. In the field of control of mobile objects (robots, etc.), the improvement of reliability and flexibility of system behavior in unpredictable situations. In the field of data mining, the detection of small deviations and errors from normal behavior in large amount of data (credit card, mortgage fraud detection). In the field of management of complex socio-ecological systems, the development of integrated approaches to modeling of interactions between population and environment based on the resilience concept Acknowledgements This work was supported by the Commission of the European Communities in the frame of the Contract IST-2000-26016 IMCOMP. References 1. Agnati L.F. Human brain in science and culture. Casa Editrice Ambrociana, Milano, 1998 (in Italian). 2. Artificial immune systems and their applications (ed. D.Dasgupta). Springer-Verlag, Berlin, 1999. 3. Bersini H. and Varela F. Hints for adaptive problem solving gleaned from immune networks. Proc. of the 1st workshop on Parallel Problem Solving from Nature, 1990, 343-354. 4. Bohinski R. Modern concepts in biochemistry. Allyn and Bacon, Boston, 1983. 5. Dasgupta D. and Attoh-Okine N. Immunity-based systems: a survey. Proc. of the IEEE Int. Conf. on Systems, Man and Cybernetics. Orlando, USA, 1997. 6. Dasgupta D. and Forrest S. Novelty detection in time series data using ideas from immunology. ISCA (th Int. Conf. on Intelligent Systems. Reno, USA, 1996. 7. DeBoer R.J., Segel L.A. and Perelson A.S. Pattern formation in one and twodimensional shape space models of the immune system.- J. Theoret. Biol., 1992, 155, 295-333. 8. Coutinho A. Immunology: the heritage of the past. Letters of the L.Pasteur Institute of Paris, 1994, 8, 26-29 (in French). 9. Ekins R. and Chu F.W. Microarrays: their origins and applications. Trends in Biotechnology, 1999, 17, 217-218. 10. Farmer J.D., Packard N.H. and Perelson A.S. The immune system, adaptation and machine learning. Physica D, 1986, 22, 187-204. 9 11. Forrest S., Javornik B., Smith R. and Perelson A. Using genetic algorithms to explore pattern recognition in the immune system. Evolutionary Computation, 1993, 1(3), 191-211. 12. Forrest S., Perelson A. Aleen L. and Cherukuri R. Self-nonself disctimination in a computer. Proc. of IEEE symposium on reseqrch in security and privacy. Oakland, USA, 1994, 202-212. 13. Forrest S., Hofmeyer S. and Somayaji A. Computer immunology. Communication of the ACM, 1997, 40(10), 88-96. 14. Gilbert C. and Routen T. Associative memory in an immune-based system. Proc. of the 12th Nat. Conf. on Artificial Intelligence. Seattle, USA, 1994, 852-857. 15. Hunt J. and Cooke D. Learning using an artificial immune system. J. of Network and Computer Applications, 1996, 19, 189-212. 16. Ishida Y. An immune network model and its applications to process diagnosis. Systems and Computers in Japan, 1993, 24(6), 38-45. 17. Ishiguru A., Watanabe Y. and Ychikawa Y. Fault diagnosis of plant system using immune networks. Proc. of the IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems. Las Vegas, USA, 1994, 34-42. 18. Jerne N.K. The immune system. Scientific American, 1973, 229(1), 52-60. 19. Jerne N.K. Towards a network theory of the immune system. Ann. Immunomol. (Inst. Pasteur), 1974, 125, 373-389 20. Haykin S. Neural networks: a comprehensive foundation. Prentice Hall Inc., 1999. 21. MacBeath G. and Schreiber S.L. Printing Proteins as Microarrays for HighThroughput Function Determination. Science, 2000, September 8; 289(5485): 17601763. 22. Perelson A. Immune network theory. Immunological Reviews, 1989, 10, 5-36. 23. Tarakanov A.O.: Mathematical models of information processing by biomolecules: formal peptide instead of formal neuron. Russian Academy of Sciences, Problems of Informatization J., 1998, 1, 46-51 (in Russian). 24. Tarakanov A. and Adamatzky A. Virtual clothing in hybrid cellular automata. 2000, http://www.ias.uwe.ac.uk/~a-adamat/clothing/cloth_06.htm 25. Tarakanov A. and Dasgupta D. A formal model of an artificial immune system. BioSystems, 2000, 55(1-3), 151-158. 26. Tarakanov A., Sokolova S., Abramov B. and Aikimbayev A. Immunocomputing of the natural plague foci. Proc. of the Genetic and Evolutionary Computation Conference (GECCO-2000), Workshop on Artificial Immune Systems, Las Vegas, USA, 2000, 38-39. 27. Wasserman P. Neural computing. Theory and practice. Van Nostrand Reihold, New York, 1990. 10