1 Protein Structure Prediction by Applying an Artificial Immune Network Volodymyr Lytvynenko1, Alexander Kornelyuk2, Konstantin Odynets2, Dmytro Peleshko3 1Informatics 2Department and Computer Science Department, Kherson National Technical University, Bereslavskoe Shose, 24, Kherson, 73008, UKRAINE, E-mail: immun56@gmail.com of Protein Engineering and Bioinformatics, Institute of Molecular Biology and Genetics NAS of Ukraine, 150, Acad. Zabolotnogo Str., Kyiv, 03143, UKRAINE, E-mail: kornelyuk@imbg.org.ua 3Department of Information Technology Publishing, , Lviv Polytechnic National University, S. Bandery Str., 12, Lviv, 79013, UKRAINE, E-mail: dpelesko@gmail.com Abstract – In this work, for the prediction protein tertiary structure is proposed to use the algorithm of artificial immune network. Among many other approaches, Artificial Immune Network is found to be a promising cooperative computational method to solve protein structure prediction in a reasonable time. To automate the right choice of parameter values the influence of self-organization is adopted to optimize the process of prediction. Torsion angles, the local structural parameters which define the backbone of protein are considered to encode the antibody that enhances the quality of the confirmation. Peptides are used to gauge the validity of the proposed algorithm. As a result, the structure predicted shows clear improvements in the root mean square deviation on overlapping the native indicates the overall performance of the algorithm. The proposed algorithm is promising which contributes to the prediction of a native-like structure by eliminating the time constraint and effort demand. In addition, the energy of the predicted structure is minimized to a greater extent, which proves the stability of protein. Key words – Protein Structure Prediction, Artificial Immune Network, Energy, Antibody. I. Introduction The computation of three-dimensional structures of proteins from their amino acid sequence is one of the most important complex problems in molecular biology. This problem is of extreme importance, since the functionality of a protein is intimately related to its threedimensional structure, which means that its tertiary structure determines its action. An eventual computational technique capable to predict the tertiary structures of long proteins would make it possible, for example, to develop new drugs with specific molecular structures capable of acting over toxic agents [1]. Solving this problem involves finding a methodology that can consistently and correctly determine the configuration of a folded protein without regard to the folding process. The problem is simply stated; however, solving is intractable [2]. Thus, a variety of algorithmic approaches have been proposed [3, 4], ranging from genetic algorithms (GAs), Simulated annealing (SA), to hybrids between deterministic and stochastic methodologies using nonlinear optimization techniques and maximum likelihood approaches [5], recently reviewed [6]. There are a large number of existing search algorithms that attempt to solve the protein structure prediction (PSP) problem by exploring feasible structures called conformations. The state-of-theart results have been achieved by local search (LS) methods [7, 8] on face-centered cubic (FCC) lattice based hydrophobic-polar (HP) energy model whereas, genetic algorithms (GA) [9], and tabu search [10] found promising results on 2D and 3D hexagonal lattice based HP models. In general, the success of GA and LS methods crucially depends on the balance of diversification and intensification of the search. Moreover, these algorithms often get stuck in local minima. As a result, they perform poorly on large proteins. Any further progress to these algorithms require addressing the above issues appropriately. SA [11] is a random-search technique that exploits an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure (the annealing process) and the search for a minimum in a general system. Chou and Carlacci [12] introduced simulated annealing method to locate the optimal starting conformations for energy minimization of both single and multiple polypeptide chains. A modified version of Pareto Archived Evolutionary Strategy (PAES) [13] has been proposed in [14, 15] to predict the structure of protein. The torsional angles of the protein were considered as the decision variables of the problem. The original mutation stage of PAES, which consists of two steps (mutate and evaluate) is replaced by four steps: (1) a clonal expansion phase, (2) an affinity maturation phase, (3) an evaluation phase, and (4) a selection phase (the best solution is chosen). During the affinity maturation phase, the authors adopt two mutation operators, which are also specifically designed for the problem of their interest. Two mutation rates are analyzed: (1) a static scheme and (2) a dynamic scheme in which the number of mutations decreases in a nonlinear way. The proposed approach is compared with respect to several techniques (both singleobjective and multiobjective). Immunity-based idea is implemented in the variation scheme which makes it different from the original PAES algorithm. In particular, the cloning expansion and the hypermutation operator from the clonal selection principle are employed to enhance the local search operation in the original PAES algorithm. I-PAES outperformed the original PAES algorithm in evaluating the protein structure prediction problems. The algorithm “COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE 2 developed Cutello, is essentially a modified clonal selection algorithm. II. Method Artificial Immune Systems (AIS) are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving [16, 17]. The AIS simulated by a set of input stimuli or patterns, one or more fitness function(s), or other means; and procedures of adaptation that govern the dynamics and metadynamics of the system, i.e., how its behavior varies over time. In optimization problems, the generalized form of antibodies is a vector of arguments Ab ( x1 , x2 , ..., xl ) , and as antigens used optimality criteria y j , expressed as functions Ag f ( x1, x2 , ..., xl ) . Affinity values g j calculated on the basis of criteria values y j reflected in the set of nonnegative numbers such as: (1) f : X , F : Thus, there is some affinity function that determines that determines the degree of conformity of individuals to each other. In such problems, we can not to operate the notion of distance, so that the best value criteria is previously unknown, and, therefore, we do not know the maximum possible extent to which individuals. Thus, the control dynamics of AIS is performed by the relative affinity values or by rank individuals set. This approach is very close to the concept of suitability (fitness) used in evolutionary algorithms that have some earlier theory of artificial immune systems. The artificial immune network (AIN) is an algorithm based on the AIS paradigm. This algorithm was inspired by the idiotypic network theory for explaining the immune system dynamics. The optimization version of the AIN presents a number of interesting features, such as dynamic variation of the population size, local and global search, and the ability to maintain any number of optima. These are highly desirable characteristics, but they are obtained at the cost of a very large number of objective function evaluations. Formally algorithm of immune network can be represented as [18]: g F f ( x1 , x2 , ..., xn ) , immNET ( P l , G k , l , k , m Ab , , f , I , , AG, AB, S , C , M , n, d , H , R) here P is the space of search (space of forms); G k is representation of the space; l is the length of vector of attributes; k is the length of receptor of cell; m Ab is size l of cells population; is the expression function; f is the affinity function; I is the function of initialization of the initial population of network cells; is the condition of completion of algorithm work; AG is subset of antigens, AB is population of network cells (antibodies); S is the operator of selection; C is the cloning operator; M is the mutation operator; n is the number of the best cells selected for cloning, d is the number of the worst cells subject to substitute by new ones; H the operator of the clonal removal; R is the operator of network compression. In the given type of algorithm the operator H uses the threshold coefficient of death d as controlling parameter by reduction network dimension by means of removal of unstimulated cells: 1. Initialization. 1.1. Creation of the initial population of cells of memory MR. 1.2. Creation of population of antibodies AB . 1.3. Encode the structure for n antibody using TINKER [18]. 1.4. Energy calculation for each Abi using Discovery Studio [19]. 2. Antigen presence. Starting from this block algorithm realizes one pass at a time for every antigen. 2.1. Determination of affinity. Affinity is determined for all cells of memory m j , m j M R for regular antigen Ag i , Ag i AG and one the best cell mb , is selected. 2.2. Cloning. The selected cell of memory is cloned proportionally to its affinity with generation of population of clones M C . 2.3. Maturation of affinity. Mutation of clones from M C is performed. Changed clones are added to population of AB AB M C . Affinity of antibodies, i.e., population of antibodies AB with the antigen Ag , is determined. 2.4. Metadynamics. Clonal removal of nonstimulated cells is done in accordance with the threshold d . 2.5. Repeated cloning of a part of antibodies from the population AB with generation of clones population M C and transition to items 2.3 if mean affinity of the population AB is lower than the given threshold value. 2.6. From the population AB the cell-candidate (the best antibody) is selected into population of cells of memory Abb . 2.7. Go to item 3 if f Abb , Ag i f mb , Ag i . 2.8. Addition of antibody Abb into the population M R . 2.9. Cell-cell interaction. Affinity of interaction of all cells of population M R with each other is determined, i.e., f mi , m j , mi , m j M R . 2.10. Network compression. Recognizing each other cells of population M R are removed according to the given threshold S . 3. Verification of the condition of stoppage of the algorithm and passing to item 2.1 (reaching the optimal rate), if the condition does not hold. “COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE 3 Ebond – Energy of stretching bond; The general scheme of the proposed algorithm is shown in Figure 1. Eangle – Energy of bending angle; EUrey Bradley – Energy of Urey-Bradley; Edihedral – Energy of improper dihedrals; Etorsion – Energy of torsion angle; EVan der Waals – Energy of van der Waals; Eelectrosta tic – Energy of charge-charge. In more detail the function of affinity is computed as follows: Etot K b b b0 2 bond E1 k 0 0 angle kUrey Bradley Urey Bradley E2 2 k 1 cosn torsion E4 E3 S S 0 K dihedral 0 2 dihedral E5 12 6 R Rmin ij qi q j min ij ij ij ij e ij non bond E E6 Fig. 1. Flowchart for immune network algorithm for protein structure prediction Antibody Representation [21]. The antibody representation varies for different problems, and as a set, it constitutes an individual structure, a set of solution representatives. Cartesian coordinates, templates, torsion angles, side chains, and simplified residues are considered for chromosome representation in PSP. In this work, torsion and side chain angles were assumed to be best for encoding the chromosomes since both contribute much to the consistency and confirmation of protein structure. Affinity Evaluation. The population created in each generation is evaluated by fitness function for selection of the elite [19, 21]. It projects the nature of each chromosome, and the energy value that determines the elite is calculated using the CHARMM force field chosen as the objective function: Etot Ebond E angle EUrey Bradley E dihedral Etorsion EVan der Waals E electrosta tic , 7 where [14]: b is the bond length, b0 is the bond equilibrium distance, and k 0 is the bond force constant. S is the distance between two atoms separated by two covalent bonds, S 0 is the equilibrium distance, and kUrey Bradley is the Urey-Bradley force constant. is the valence angle, 0 is the equilibrium angle, and k 0 is the valence angle force constant. is the dihedral or torsion, k is the dihedral force constant, n is the multiplicity, and is the phase angle. is the improper angle, 0 is the equilibrium improper angle, and k dihedral is the improper force constant. ij is Lennard-Jones well depth, ij is the distance between angles i and j , Rmin is the minimum ij interaction radius, q i is the partial atomic charges, and is the dielectric constant. where: Etot – Total energy (fitness); “COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE 4 Conclusion The proposed algorithm of artificial immune network is promising which contributes to the prediction of a native-like structure by eliminating the time constraint and effort demand. In addition, the energy of the predicted structure is minimized to a greater extent, which proves the stability of protein. References [1] Lehninger, A.L., Nelson, D.L., Cox, M.M. Principles of Biochemistry 4 ed., Freeman, New York, 2005. [2] M.M.L. Khimasia. Np complete problems. http://www.tcm.phy.cam.ac.uk/∼mmlk2/report13/nod e31.html, 1996. [3] K. Lipkowitz and D. Boyd. Reviews in Computational Chemistry, vol. 10. VCH Publishers, Inc, 333 7th Avenue, New York, New York, 1997. [4] S. Schulze-Kremer. Genetic algorithms and protein folding. Methods in Molecular Biology, 143:175–222, 2000. Protein Structure Prediction: Methods and Protocols. [5] J. Ecker, M. Kupferschmid, C. Lawrence, A. Reilly, and A. Scott. An application of nonlinear optimization in molecular biology. European Journal Of Operational Research, 138(2):452–458, April 2002. Department of Mathematical Sciences, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 121803590, USA. [6] J.D. Bryngelson, E.M. Billings, O.G. Mouritsen, J. Hertz, M.H. Jensen, K. Sneppen, and H. Flyvbjerg. From Interatomic Interactions to Protein Structure, vol. 480, chapter Physics of Biological Systems : From Molecules to Species, pp. 80–116. SpringerVerlag New York, 1997. [7] Cebri´an, M., Dot´u, I., Van Hentenryck, P., Clote, P. Protein structure prediction on the face centered cubic lattice by local search. In: Proceedings of the 23 rd national conference on Artificial intelligence. Vol. 1. pp. 241–246 (2008). [8] Dotu, I., Cebri´an, M., Van Hentenryck, P., Clote, P. On Lattice Protein Structure Prediction Revisited. IEEE Transactions on Comp. Bio. and Bioinformatics (2011). [9] Hoque, M.T., Chetty, M., Sattar, A. Protein Folding Prediction in 3D FCC HP Lattice Model Using Genetic Algorithm. vol. 2007, pp. 4138–4145. IEEE Congress on Evolutionary Computation (2007). [10] Böckenhauer, H.J., Ullah, A.Z.M.D., Kapsokalivas, L., Steinhöfel, K. A Local Move Set for Protein Folding in Triangular Lattice Models. In: WABI. Lecture Notes in Computer Science, vol. 5251, pp. 369–381. Springer (2008) [11] D. Bertsimas and J. Tsitsiklis. Simulated annealing. Statistical Science, 8(1):10–15, 1993. [12] K.C. Chou and L. Carlacci. Simulated annealing approach to the study of protein structures. Protein Engineering, 1991. [13] J.D. Knowles, R.A. Watson, and D.W. Corne. Reducing local optima in single-objective problems by multi-objectivization. In Evolutionary Multi-Criterion Optimization, vol. 1993 of Lecture Notes in Computer Science, pp. 269–283. Springer Berlin, 2001. [14] V. Cutello, G. Narzisi, and G. Nicosia. A multiobjective evolutionary approach to the protein structure prediction problem. Journal of The Royal Society Interface, 3(6):139–151, 2006. [15] V. Cutello, G. Narzisi, and G. Nicosia. A class of pareto archived evolution strategy algorithms using immune inspired operators for ab-initio protein structure prediction. In Applications of Evolutionary Computing, Lecture Notes in Computer Science, pp. 54–63. Springer Berlin, 2005. [16] de Castro, L.N.; Timmis, J. (2002). Artificial Immune Systems: A New Computational Intelligence Approach. Springer. pp. 57–58. ISBN 978-1-85233594-6. [17] L.N. de Castro and J. Timmis. An artificial immune network for multimodal function optimization. In Proc. IEEE Congr. Evolutionary Computation, vol. 1, May 2002, pp. 699–674. [18] Bidyuk P.I, Lytvynenko V.I, FefelovA.O The formalization of methods for constructing artificial immune systems. Research Bulletin of National Technical University of Ukraine "Kyiv Polytechnic Institute" (Naukovi visti) 2007. № 1. pp.29-41.(in Ukraine) [19] Ponder, J. et al. TINKER: Software Tools for Molecular Design. Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, 1998. [20] Accelrys Software Inc. Discovery Studio modeling environment, release 3.1. San Diego (CA): Accelrys Software Inc.; 2012. [21] Venkatesan A, Gopal J, Candavelou M, Gollapalli S, Karthikeyan K. Computational approach for protein structure prediction. Healthc Inform Res. 2013;19(2):137–147. [22] Cutello, V. and Nicosia, G. 2004 The clonal selection principle for in silico and in vitro computing. In Recent developments in biologically inspired computing (ed. L.N. de Castro & F.J. Von Zuben), pp. 104–145. Hershey, PA: Idea Group Publishing. [23] http://dasher.wustl.edu/tinker/ “COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE