The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 4, September-October 2013 AIS-PRMACA: Artificial Immune System based Multiple Attractor Cellular Automata for Strengthening PRMACA, Promoter Region Identification Pokkuluri Kiran Sree*, Inampudi Ramesh Babu** & S.S.S.N. Usha Devi Nedunuri*** *Research Scholar, Department of Computer Science & Engineering, Jawaharlal Nehru Technological University Hyderabad, INDIA. E-Mail: profkiransree@gmail.com **Professor, Department of Computer Science & Engineering, Acharya Nagarjuna University, INDIA. ***Assistant Professor, Department of Computer Science & Engineering, Jawaharlal Nehru Technological University Kakinada, INDIA. Abstract—Identifying promoter region is an important problem in bioinformatics. Transcription of particular gene of DNA is initiated by a promoter. So many techniques are employed in identifying these promoter regions but all of them will attract an average accuracy of 84%. A new algorithm based on artificial immune system AIS-PRMACA was proposed to strengthen and improve the accuracy of promoter prediction system PRMACA. The proposed classifier is tested with BDGP and ENCODE data sets has produced an accuracy of 90.5% which is a considerable improvement when compared with the accuracy of conventional methods available. Keywords—Artificial Immune System; Cellular Automata; MACA; Promoter Region. Abbreviations—Cellular Automata (CA); Dependency Vector (DV); Dependency String (DS); Genetic Algorithm (GA); Multiple Attractor Cellular Automata (MACA). I. INTRODUCTION P ROMOTERS are responsible for most important biochemical functions such as nutrient transportation, making muscle fibers and important bio chemical functions which occupy macro regions in human body. Specifically, the Promoters are chains of amino acids and DNA sequences, of which there are 20 different types, coupled by peptide bonds [Miskimins et al., 1985]. The structural hierarchy possessed by Promoters is typically referred to as primary and tertiary region. Promoter Region Predication from sequences of amino acid gives tremendous value to biological community. This is because the higherlevel and secondary level [P. Kiran Sree et al., 2013], [Miskimins et al., 1985] regions determine the function of the Promoters and consequently, the insight into its function can be inferred from that. II. RELATED WORKS IN PROMOTER REGION IDENTIFICATION Nearest neighbor techniques attempt to predict the region of a central residue, within a segment of amino acids [Kiran Sree & Dr. Inampudi Ramesh Babu, 2013; 2013A], based on the ISSN: 2321 – 2381 known regions of homologous segments. Steen Knudsen al [Bauer et al., 1988] has used statistical classifiers to identify promoter regions. Techniques for region identification include, but are not limited to, constraint programming methods, statistical approaches to predict the probability of an amino acid being in one of the structural elements, and Bayesian network models. We also proposed an algorithm to identify the promoter regions using CA with text clustering with a lesser accuracy. We have also proposed an algorithm [P. Kiran Sree et al., 2013] with true MACA with considerable accuracy [Abagyan et al., 1997; Maji & Chaudhuri, 2004]. III. CELLULAR AUTOMATA Cellular Automata (CA) is a simple model of a spatially extended decentralized system, made up of a number of individual components (cells). The communication among constituent cells is limited to local interaction. Each individual cell is in a specific state that changes over time depending on the states of its neighbors. From the days of Von Neumann who first proposed the model of Cellular Automata (CA) [Reese, 2001; Debasis Mitra & Michael Smith, 2004], to Wolfram‟s recent book „A New Kind of © 2013 | Published by The Standard International Journals (The SIJ) 124 The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 4, September-October 2013 Science‟, the simple and local neighborhood region of CA has attracted researchers from diverse disciplines. It has been subjected to rigorous mathematical and physical analysis [Kiran Sree & Dr. Inampudi Ramesh Babu, 2008; 2010] for past fifty years and its application has been proposed in different branches of science - both social and physical. Definition: CA is defined a four tipple <G, Z, N, F> Where G -> Grid (Set of cells) Z -> Set of possible cell states N -> Set which describe cells neighborhoods F -> Transition Function (Rules of automata) The evolution process is directed by the popular Genetic Algorithm (GA) with the underlying philosophy of survival of the fittest gene. This GA framework can be adopted to arrive at the desired CA rule region appropriate to model a physical system. The goals of GA formulation are to enhance the understanding of the ways CA performs computations and to learn how CA may be evolved to perform a specific computational task and to understand how evolution creates complex global behavior in a locally interconnected system of simple cells. IV. ARTIFICIAL IMMUNE SYSTEM (AIS) LEARNING AI learning involves the task of learning from a series of examples. The end objective is to deal with any general types of data, including cases where the number and type of attributes may vary, and where additional layers of learning are superimposed, with hierarchical structure of attributes and classes. AIS learning aims to generate classifying expressions simple enough to be understood easily by human. It must mimic human reasoning sufficiently to provide insight into the AIS process. Background knowledge may be exploited in AIS learning development, but operation is assumed without human intervention. 4.1. Artificial Immune System (AIS) Tree AIS tree performs multistage hierarchical decision making. Pattern recognition based on AIS trees was motivated by the need to interpret images from remote sensing satellites. AIS trees in particular and induction methods in general, are associated with AIS learning to avoid the knowledge acquisition bottleneck for expert systems. An overview of work on AIS trees in pattern recognition can be found in. 4.2. AIS-PRMACA Tree Building Input: Training set S = {S1, S2, · ·, SK} Output: AIS-PRMACA Tree. Partition(S, K) Step 1: Generate a AIS-PRMACA with k number of attractor basins. Step 2: Distribute training set S into k attractor basins (nodes). Step 3: Evaluate the distribution in each attractor basin. ISSN: 2321 – 2381 Step 4: If all the examples (S‟) of an attractor basin (node) belong to a single class, then label the attractor basin. Step 5: If examples (S‟) of an attractor basin belong to K‟ number of AIS-MACA classes, then, Partition (S‟, K‟). Step 6: Stop. A high level comparative perspective on the classification literature in pattern recognition and artificial intelligence can be found in. Tree induction from a statistical perspective, is reviewed in. A majority of work on AIS trees [Kiran Sree & Dr. Inampudi Ramesh Babu, 2009] in AIS learning is an offshoot of Breiman‟s work and Quinlan‟s algorithm. Moret provided an overview of the work on representing boolean functions as AIS trees and diagrams and its application in pattern recognition. Safavin and Landgrebe surveyed the literature on AIS tree classifiers, almost entirely from a pattern recognition perspective. The model is built describing a predefined set of data classes. A sample set from the database, each member belonging to one of the predefined classes, is used to train the model. The training phase is also termed as supervised learning of the classifier. Each member may have multiple features/attributes. The classifier is trained based on a specific feature/metric. Subsequent to training, the model performs the task of prediction in the testing phase. Prediction of the class of an input sample is done based on some metric, typically distance metric. 4.3. Random Generation of Initial Population To form the initial population, it must be ensured that each solution randomly generated is a combination of an n-bit DS with 2m number of attractor basins (Classifier #1) and an mbit DV (Classifier #2). The chromosomes are randomly synthesized according to the following steps. 1. Randomly partition n into m number of integers such that n1 + n2 + · · · + nm = n. 2. For each ni, randomly generate a valid Dependency Vector (DV). 3. Synthesize Dependency String (DS) through concatenation of m number of DVs for Classifier #1. 4. Randomly synthesize an m-bit Dependency Vector (DV) for Classifier #2. 5. Synthesize a chromosome through concatenation of Classifier #1 and Classifier #2. 4.4. Mutation Algorithm The mutation algorithm emulates the normal mutation scheme. It makes some minimal change in the existing chromosome of PP (Present Population) to form a new chromosome for NP (Next Population). The chromosome in the current GA formulation is mutated at a single point. Figure 1 shows a simple mutation example. © 2013 | Published by The Standard International Journals (The SIJ) 125 The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 4, September-October 2013 V. EXPERIMENTAL RESULTS Experiments are conducted on EDCODE and BDGP data sets. The output and result (Table 1) were represented below. Figure 1: Simple Mutation AIS-PRMACA Output ******************************************************************************************************************* 31958321; length 12001 Length of sequence: 12001 4 promoter/enhancer(s) are predicted Promoter Pos: 6078 LDF: +16.297 TATA box at 6049 +5.597 TATAAAGT Pos: 5942 Score: +12.499 Promoter Pos: 1363 LDF: +5.235 TATA box at 1336 +6.514 AATAAAAG Promoter Pos: 7068 LDF: +1.165 TATA box at 7039 +4.190 TAAAAATA Promoter Pos: 9650 LDF: +1.051 TATA box at 9618 +4.491 GTTAAAAA ******************************************************************************************************************* Table 1: Predictive Accuracy of Promoter Prediction Amino Acid Algorithm DNA Sequence Sequence 51% 46% Bayesian 68% 72% Normal ID-CA 74% 68% Neural Networks 79% 72.5% PRMACA 89.6% 91% AIS-PRMACA VI. CONCLUSION AIS-PRMACA was trained to identify promoter regions from sequences of DNA as well as amino acid. Through analysis and extensive testing/training are conducted for improving the proposed classifier accuracy over several data sets. The experiments results over ENCODE and BDGP datasets indicate an average accuracy of the classifier is 90.5%. The technique of artificial immune system was employed earlier on strengthening protein coding regions and protein structure prediction also, seems promising to improve the classifier accuracy in many other problems related to bioinformatics. [3] [4] [5] [6] [7] [8] REFERENCES [1] [2] Miskimins, W. Keith, Michael P. Roberts, Alan McClelland, & Frank H. Ruddle (1985), “Use of a Protein-Blotting Procedure and a Specific DNA Probe to Identify Nuclear Proteins that Recognize the Promoter Region of the Transferring Receptor Gene”, Proceedings of the National Academy of Sciences 82, No. 20, Pp. 6741–6744. C.E. Bauer, D.A. Young & B.L. Marrs (1988), “Analysis of the Rhodobacter capsulatus puf operon. Location of the OxygenRegulated Promoter Region and the Identification of an Additional puf-encoded gene”, Journal of Biological Chemistry 263, No. 10, pp. 4820–4827. ISSN: 2321 – 2381 [9] [10] R. Abagyan, S. Batalov, T. Cardozo, M. Totrov, J. Webber& Y. Zhou (1997), “Homology Modeling with Internal Coordinate Mechanics: Deformation Zone Mapping and Improvements of Models via Conformational Search”, Promoters: Region, Function and Genetics, No. 1, Pp. 29–37. MG. Reese (2001), “Application of a Time-Delay Neural Network to Promoter Annotation in the Drosophila melanogaster Genome”, Comput Chem, Vol. 26, No. 1, Pp. 51– 56. Eric E. Snyder & Gary D. Stormo (2002), “Identification of Protein Coding Regions in Genomic DNA”, ICCS Transactions. Debasis Mitra & Michael Smith (2004), “Digital Sequences Processing in Promoter Region Identification”, Innovations in Applied Artificial Intelligence Lecture Notes in Computer Science, Vol. 3029, Pp. 40–49. P. Maji & P.P. Chaudhuri (2004), “FMACA: A Fuzzy Cellular Automata based Pattern Classifier”, Proceedings of 9th International Conference on Database Systems, Korea, Pp. 494–505. P. Kiran Sree & Dr. Inampudi Ramesh Babu (2008), “A Novel Promoter Coding Region Identifying Tool using Cellular Automata Classifier with Trust-Region Method and Parallel Scan Algorithm (NPCRITCACA)”, International Journal of Biotechnology & Biochemistry (IJBB), Vol. 4, No. 2, Pp. 177– 189. P. Kiran Sree & Dr. Inampudi Ramesh Babu (2009), “Investigating an Artificial Immune System to Strengthen the Promoter Region Identification and Promoter Coding Region Identification using Cellular Automata Classifier”, International Journal of Bioinformatics Research and Applications, Vol. 5, No. 6, Pp. 647–662. P. Kiran Sree & Dr. Inampudi Ramesh Babu (2010), “Identification of Promoter Region in Genomic DNA using Cellular Automata based Text Clustering”, The International Arab Journal of Information Technology (IAJIT),Vol. 7, No. 1, Pp. 75–78. © 2013 | Published by The Standard International Journals (The SIJ) 126 The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 4, September-October 2013 [11] [12] [13] P. Kiran Sree, Dr. Inampudi Ramesh Babu & S.S.S.N. Usha Devi Nedunuri (2013), “PRMACA: A Promoter Region identification using Multiple Attractor Cellular Automata (MACA) in ICT and Critical Infrastructure”, Proceedings of the 48th Annual Convention of Computer Society of India, Vol. 248, Pp. 393–399. P. Kiran Sree & Dr. Inampudi Ramesh Babu (2013), “PSMACA: An Automated Protein Structure Prediction using MACA (Multiple Attractor Cellular Automata)”, Journal of Bioinformatics and Intelligent Control (JBIC), American Scientific Publications, USA, Vol. 2, No. 3, Pp. 1–5. P. Kiran Sree & Dr. Inampudi Ramesh Babu (2013A), “An Extensive Report on Cellular Automata based Artificial Immune System for Strengthening Automated Protein Prediction”. Advances in Biomedical Engineering Research (ABER), Science Publications (USA), Vol. 1, No. 3, Pp. 45–51. Prof P. Kiran Sree is working as Professor in department of CSE in BVC Engineering College. He has published forty technical papers in international journals and conferences. His areas of interests include parallel algorithms, artificial intelligence, and compiler design and computer networks. He was the reviewer for many IEEE Society conferences and Journals in artificial intelligence and networks. His bibliography is listed in Marquis Who‟s Who in the World, 29th Edition (2012), USA. ISSN: 2321 – 2381 Dr Inampudi Ramesh Babu is working as a Professor in the Department of Computer Science& Engineering, Acharya Nagarjuna University. Also, he has been an Academic Senate member of the same university since 2006. He held many positions in the Acharya Nagurjuna University as Head, Director of the Computer Centre, Chairman of the PG Board of studies, Member of the Executive Council, Special Officer, Additional Convener of ICET Examinations and Convener of MCA Admissions. He is currently supervising ten PhD students who are working in different areas of image processing and artificial intelligence. He has published 100 papers in international journals and conferences. Smt S. S. S. N. Usha Devi is working as Assistant Professor in department of CSE in University College of Engineering, JNTU Kakinada. She has published five papers in international journals and four in international conferences. She is the member of IAENG. © 2013 | Published by The Standard International Journals (The SIJ) 127