3Department of Information Technology Publishing, , Lviv

advertisement
1
Protein Structure Prediction by Applying an Artificial Immune
Network
Volodymyr Lytvynenko1, Alexander Kornelyuk2, Konstantin Odynets2, Dmytro Peleshko3
1Informatics
2Department
and Computer Science Department, Kherson National Technical University, Bereslavskoe Shose, 24, Kherson,
73008, UKRAINE, E-mail: immun56@gmail.com
of Protein Engineering and Bioinformatics, Institute of Molecular Biology and Genetics NAS of Ukraine, 150, Acad.
Zabolotnogo Str., Kyiv, 03143, UKRAINE, E-mail: kornelyuk@imbg.org.ua
3Department
of Information Technology Publishing, , Lviv Polytechnic National University, S. Bandery Str., 12, Lviv, 79013,
UKRAINE, E-mail: dpelesko@gmail.com
Abstract – In this work, for the prediction protein tertiary
structure is proposed to use the algorithm of artificial immune
network. Among many other approaches, Artificial Immune
Network is found to be a promising cooperative computational
method to solve protein structure prediction in a reasonable
time. To automate the right choice of parameter values the
influence of self-organization is adopted to optimize the
process of prediction. Torsion angles, the local structural
parameters which define the backbone of protein are
considered to encode the antibody that enhances the quality of
the confirmation. Peptides are used to gauge the validity of the
proposed algorithm. As a result, the structure predicted shows
clear improvements in the root mean square deviation on
overlapping the native indicates the overall performance of the
algorithm. The proposed algorithm is promising which
contributes to the prediction of a native-like structure by
eliminating the time constraint and effort demand. In addition,
the energy of the predicted structure is minimized to a greater
extent, which proves the stability of protein.
Key words – Protein Structure Prediction, Artificial Immune
Network, Energy, Antibody.
I. Introduction
The computation of three-dimensional structures of
proteins from their amino acid sequence is one of the
most important complex problems in molecular biology.
This problem is of extreme importance, since the
functionality of a protein is intimately related to its threedimensional structure, which means that its tertiary
structure determines its action. An eventual computational
technique capable to predict the tertiary structures of long
proteins would make it possible, for example, to develop
new drugs with specific molecular structures capable of
acting over toxic agents [1]. Solving this problem
involves finding a methodology that can consistently and
correctly determine the configuration of a folded protein
without regard to the folding process. The problem is
simply stated; however, solving is intractable [2]. Thus, a
variety of algorithmic approaches have been proposed [3,
4], ranging from genetic algorithms (GAs), Simulated
annealing (SA), to hybrids between deterministic and
stochastic methodologies using nonlinear optimization
techniques and maximum likelihood approaches [5],
recently reviewed [6]. There are a large number of
existing search algorithms that attempt to solve the
protein structure prediction (PSP) problem by exploring
feasible structures called conformations. The state-of-theart results have been achieved by local search (LS)
methods [7, 8] on face-centered cubic (FCC) lattice based
hydrophobic-polar (HP) energy model whereas, genetic
algorithms (GA) [9], and tabu search [10] found
promising results on 2D and 3D hexagonal lattice based
HP models. In general, the success of GA and LS
methods crucially depends on the balance of
diversification and intensification of the search.
Moreover, these algorithms often get stuck in local
minima. As a result, they perform poorly on large
proteins. Any further progress to these algorithms require
addressing the above issues appropriately. SA [11] is a
random-search technique that exploits an analogy
between the way in which a metal cools and freezes into a
minimum energy crystalline structure (the annealing
process) and the search for a minimum in a general
system. Chou and Carlacci [12] introduced simulated
annealing method to locate the optimal starting
conformations for energy minimization of both single and
multiple polypeptide chains. A modified version of Pareto
Archived Evolutionary Strategy (PAES) [13] has been
proposed in [14, 15] to predict the structure of protein.
The torsional angles of the protein were considered as the
decision variables of the problem. The original mutation
stage of PAES, which consists of two steps (mutate and
evaluate) is replaced by four steps: (1) a clonal expansion
phase, (2) an affinity maturation phase, (3) an evaluation
phase, and (4) a selection phase (the best solution is
chosen). During the affinity maturation phase, the authors
adopt two mutation operators, which are also specifically
designed for the problem of their interest. Two mutation
rates are analyzed: (1) a static scheme and (2) a dynamic
scheme in which the number of mutations decreases in a
nonlinear way. The proposed approach is compared with
respect to several techniques (both singleobjective and
multiobjective). Immunity-based idea is implemented in
the variation scheme which makes it different from the
original PAES algorithm. In particular, the cloning
expansion and the hypermutation operator from the clonal
selection principle are employed to enhance the local
search operation in the original PAES algorithm. I-PAES
outperformed the original PAES algorithm in evaluating
the protein structure prediction problems. The algorithm
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE
2
developed Cutello, is essentially a modified clonal
selection algorithm.
II. Method
Artificial Immune Systems (AIS) are adaptive
systems, inspired by theoretical immunology and
observed immune functions, principles and models, which
are applied to problem solving [16, 17].
The AIS simulated by a set of input stimuli or
patterns, one or more fitness function(s), or other means;
and procedures of adaptation that govern the dynamics
and metadynamics of the system, i.e., how its behavior
varies over time.
In optimization problems, the generalized form of
antibodies is a vector of arguments Ab  ( x1 , x2 , ..., xl ) ,
and as antigens used optimality criteria y j , expressed as
functions Ag  f ( x1, x2 , ..., xl ) . Affinity values g j
calculated on the basis of criteria values y j reflected in
the set of nonnegative numbers such as:
(1)
f : X  , F :    
Thus,
there
is
some
affinity
function
that
determines
that
determines the degree of conformity of individuals to
each other. In such problems, we can not to operate the
notion of distance, so that the best value criteria is
previously unknown, and, therefore, we do not know the
maximum possible extent to which individuals. Thus, the
control dynamics of AIS is performed by the relative
affinity values or by rank individuals set. This approach is
very close to the concept of suitability (fitness) used in
evolutionary algorithms that have some earlier theory of
artificial immune systems.
The artificial immune network (AIN) is an algorithm
based on the AIS paradigm. This algorithm was inspired
by the idiotypic network theory for explaining the
immune system dynamics. The optimization version of
the AIN presents a number of interesting features, such as
dynamic variation of the population size, local and global
search, and the ability to maintain any number of optima.
These are highly desirable characteristics, but they are
obtained at the cost of a very large number of objective
function evaluations. Formally algorithm of immune
network
can
be
represented
as
[18]:
g  F  f ( x1 , x2 , ..., xn )  ,
immNET ( P l , G k , l , k , m Ab ,  , f ,
I ,  , AG, AB, S , C , M , n, d , H , R)
here P is the space of search (space of forms); G k is
representation of the space; l is the length of vector of
attributes; k is the length of receptor of cell; m Ab is size
l
of cells population;  is the expression function; f is
the affinity function; I is the function of initialization of
the initial population of network cells;  is the condition
of completion of algorithm work; AG is subset of
antigens, AB is population of network cells (antibodies);
S
is the operator of selection; C is the cloning operator;
M is the mutation operator; n is the number of the best
cells selected for cloning, d is the number of the worst
cells subject to substitute by new ones; H the operator of
the clonal removal; R is the operator of network
compression. In the given type of algorithm the operator
H uses the threshold coefficient of death  d  as
controlling parameter by reduction network dimension by
means of removal of unstimulated cells:
1.
Initialization.
1.1. Creation of the initial population of cells of memory
MR.
1.2. Creation of population of antibodies  AB  .
1.3. Encode the structure for n antibody using TINKER
[18].
1.4. Energy calculation for each Abi using Discovery
Studio [19].
2.
Antigen presence. Starting from this block
algorithm realizes one pass at a time for every antigen.
2.1. Determination of affinity. Affinity is determined for
all cells of memory m j , m j  M R for regular antigen
Ag i , Ag i  AG and one the best cell mb , is selected.
2.2. Cloning. The selected cell of memory is cloned
proportionally to its affinity with generation of population
of clones M C .
2.3. Maturation of affinity. Mutation of clones from M C
is performed. Changed clones are added to population of
AB  AB  M C . Affinity of
antibodies, i.e.,
population of antibodies AB with the antigen Ag , is
determined.
2.4. Metadynamics. Clonal removal of nonstimulated cells
is done in accordance with the threshold  d .
2.5. Repeated cloning of a part of antibodies from the
population AB with generation of clones population
M C and transition to items 2.3 if mean affinity of the
population AB is lower than the given threshold value.
2.6. From the population AB the cell-candidate (the best
antibody) is selected into population of cells of memory
Abb .
2.7. Go to item 3 if f  Abb , Ag i   f mb , Ag i  .
2.8. Addition of antibody Abb into the population M R .
2.9. Cell-cell interaction. Affinity of interaction of all
cells of population M R with each other is determined,


i.e., f mi , m j , mi , m j  M R .
2.10. Network compression. Recognizing each other cells
of population M R are removed according to the given
threshold
S .
3. Verification of the condition of stoppage of the
algorithm and passing to item 2.1 (reaching the optimal
rate), if the condition does not hold.
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE
3
Ebond – Energy of stretching bond;
The general scheme of the proposed algorithm is shown
in Figure 1.
Eangle – Energy of bending angle;
EUrey Bradley – Energy of Urey-Bradley;
Edihedral – Energy of improper dihedrals;
Etorsion – Energy of torsion angle;
EVan der Waals – Energy of van der Waals;
Eelectrosta tic – Energy of charge-charge.
In more detail the function of affinity is computed as
follows:
Etot 
 K b b  b0 2 
bond

E1

 k 0    0 
angle

kUrey Bradley
Urey Bradley

E2
2

 k  1  cosn   

torsion


E4
E3

S  S 0 
 K dihedral   0 2
dihedral


E5
12
6
 R
 Rmin ij   qi q j
min ij 






   ij

  ij 
  ij   e ij
non bond 


  

E
E6
Fig. 1. Flowchart for immune network algorithm for protein
structure prediction
Antibody Representation [21]. The antibody
representation varies for different problems, and as a set,
it constitutes an individual structure, a set of solution
representatives. Cartesian coordinates, templates, torsion
angles, side chains, and simplified residues are considered
for chromosome representation in PSP. In this work,
torsion and side chain angles were assumed to be best for
encoding the chromosomes since both contribute much to
the consistency and confirmation of protein structure.
Affinity Evaluation. The population created in each
generation is evaluated by fitness function for selection of
the elite [19, 21]. It projects the nature of each
chromosome, and the energy value that determines the
elite is calculated using the CHARMM force field chosen
as the objective function:
Etot  Ebond  E angle  EUrey Bradley  E dihedral 
 Etorsion  EVan der Waals  E electrosta tic ,
7
where [14]:
b is the bond length, b0 is the bond equilibrium
distance, and k 0 is the bond force constant.
S is the distance between two atoms separated by two
covalent bonds, S 0 is the equilibrium distance, and
kUrey Bradley is the Urey-Bradley force constant.
 is the valence angle,  0 is the equilibrium angle, and
k 0 is the valence angle force constant.
 is the dihedral or torsion, k  is the dihedral force
constant, n is the multiplicity, and  is the phase
angle.
 is the improper angle,  0 is the equilibrium improper
angle, and k dihedral is the improper force constant.
 ij is Lennard-Jones well depth,  ij is the distance
between angles i and j , Rmin is the minimum
ij
interaction radius, q i is the partial atomic charges, and
 is the dielectric constant.
where:
Etot – Total energy (fitness);
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE
4
Conclusion
The proposed algorithm of artificial immune
network is promising which contributes to the prediction
of a native-like structure by eliminating the time
constraint and effort demand. In addition, the energy of
the predicted structure is minimized to a greater extent,
which proves the stability of protein.
References
[1] Lehninger, A.L., Nelson, D.L., Cox, M.M. Principles
of Biochemistry 4 ed., Freeman, New York, 2005.
[2] M.M.L. Khimasia. Np complete problems.
http://www.tcm.phy.cam.ac.uk/∼mmlk2/report13/nod
e31.html, 1996.
[3] K. Lipkowitz and D. Boyd. Reviews in Computational
Chemistry, vol. 10. VCH Publishers, Inc, 333 7th
Avenue, New York, New York, 1997.
[4] S. Schulze-Kremer. Genetic algorithms and protein
folding. Methods in Molecular Biology, 143:175–222,
2000. Protein Structure Prediction: Methods and
Protocols.
[5] J. Ecker, M. Kupferschmid, C. Lawrence, A. Reilly,
and A. Scott. An application of nonlinear optimization
in molecular biology. European Journal Of
Operational Research, 138(2):452–458, April 2002.
Department of Mathematical Sciences, Rensselaer
Polytechnic Institute, 110 8th Street, Troy, NY 121803590, USA.
[6] J.D. Bryngelson, E.M. Billings, O.G. Mouritsen, J.
Hertz, M.H. Jensen, K. Sneppen, and H. Flyvbjerg.
From Interatomic Interactions to Protein Structure,
vol. 480, chapter Physics of Biological Systems :
From Molecules to Species, pp. 80–116. SpringerVerlag New York, 1997.
[7] Cebri´an, M., Dot´u, I., Van Hentenryck, P., Clote, P.
Protein structure prediction on the face centered cubic
lattice by local search. In: Proceedings of the 23 rd
national conference on Artificial intelligence. Vol. 1.
pp. 241–246 (2008).
[8] Dotu, I., Cebri´an, M., Van Hentenryck, P., Clote, P.
On Lattice Protein Structure Prediction Revisited.
IEEE Transactions on Comp. Bio. and Bioinformatics
(2011).
[9] Hoque, M.T., Chetty, M., Sattar, A. Protein Folding
Prediction in 3D FCC HP Lattice Model Using
Genetic Algorithm. vol. 2007, pp. 4138–4145. IEEE
Congress on Evolutionary Computation (2007).
[10] Böckenhauer, H.J., Ullah, A.Z.M.D., Kapsokalivas,
L., Steinhöfel, K. A Local Move Set for Protein
Folding in Triangular Lattice Models. In: WABI.
Lecture Notes in Computer Science, vol. 5251, pp.
369–381. Springer (2008)
[11] D. Bertsimas and J. Tsitsiklis. Simulated annealing.
Statistical Science, 8(1):10–15, 1993.
[12] K.C. Chou and L. Carlacci. Simulated annealing
approach to the study of protein structures. Protein
Engineering, 1991.
[13] J.D. Knowles, R.A. Watson, and D.W. Corne.
Reducing local optima in single-objective problems by
multi-objectivization. In Evolutionary Multi-Criterion
Optimization, vol. 1993 of Lecture Notes in Computer
Science, pp. 269–283. Springer Berlin, 2001.
[14] V. Cutello, G. Narzisi, and G. Nicosia. A multiobjective evolutionary approach to the protein
structure prediction problem. Journal of The Royal
Society Interface, 3(6):139–151, 2006.
[15] V. Cutello, G. Narzisi, and G. Nicosia. A class of
pareto archived evolution strategy algorithms using
immune inspired operators for ab-initio protein
structure prediction. In Applications of Evolutionary
Computing, Lecture Notes in Computer Science, pp.
54–63. Springer Berlin, 2005.
[16] de Castro, L.N.; Timmis, J. (2002). Artificial
Immune Systems: A New Computational Intelligence
Approach. Springer. pp. 57–58. ISBN 978-1-85233594-6.
[17] L.N. de Castro and J. Timmis. An artificial immune
network for multimodal function optimization. In
Proc. IEEE Congr. Evolutionary Computation, vol. 1,
May 2002, pp. 699–674.
[18] Bidyuk P.I, Lytvynenko V.I, FefelovA.O The
formalization of methods for constructing artificial
immune systems. Research Bulletin of National
Technical University of Ukraine "Kyiv Polytechnic
Institute" (Naukovi visti) 2007. № 1. pp.29-41.(in
Ukraine)
[19] Ponder, J. et al. TINKER: Software Tools for
Molecular Design. Department of Biochemistry and
Molecular Biophysics, Washington University School
of Medicine, St. Louis, MO, 1998.
[20] Accelrys Software Inc. Discovery Studio modeling
environment, release 3.1. San Diego (CA): Accelrys
Software Inc.; 2012.
[21] Venkatesan A, Gopal J, Candavelou M, Gollapalli S,
Karthikeyan K. Computational approach for protein
structure
prediction.
Healthc
Inform
Res.
2013;19(2):137–147.
[22] Cutello, V. and Nicosia, G. 2004 The clonal selection
principle for in silico and in vitro computing. In
Recent developments in biologically inspired
computing (ed. L.N. de Castro & F.J. Von Zuben), pp.
104–145. Hershey, PA: Idea Group Publishing.
[23] http://dasher.wustl.edu/tinker/
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE
Download