Protein folding simulations of 2D HP model by the genetic algorithm

advertisement
Computational Biology and Chemistry 34 (2010) 137–142
Contents lists available at ScienceDirect
Computational Biology and Chemistry
journal homepage: www.elsevier.com/locate/compbiolchem
Research article
Protein folding simulations of 2D HP model by the genetic algorithm based on
optimal secondary structures
Chenhua Huang, Xiangbo Yang ∗ , Zhihong He
MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, South China Normal University, Zhongshan Road, Guangzhou 510631, China
a r t i c l e
i n f o
Article history:
Received 12 February 2009
Received in revised form
20 December 2009
Accepted 27 April 2010
Keywords:
Protein folding
HP model
Genetic algorithm
Secondary structure
a b s t r a c t
In this paper, based on the evolutionary Monte Carlo (EMC) algorithm, we have made four points
of ameliorations and propose a so-called genetic algorithm based on optimal secondary structure
(GAOSS) method to predict efficiently the protein folding conformations in the two-dimensional
hydrophobic–hydrophilic (2D HP) model. Nine benchmarks are tested to verify the effectiveness of the
proposed approach and the results show that for the listed benchmarks GAOSS can find the best solutions so far. It means that reasonable, effective and compact secondary structures (SSs) can avoid blind
searches and can reduce time consuming significantly. On the other hand, as examples, we discuss the
diversity of protein GSC for the 24-mer and 85-mer sequences. Several GSCs have been found by GAOSS
and some of the conformations are quite different from each other. It would be useful for the designing
of protein molecules. GAOSS would be an efficient tool for the protein structure predictions (PSP).
© 2010 Elsevier Ltd. All rights reserved.
1. Introduction
Protein folding is an interesting topic and people have paid much
attention to it. An incomplete list is given as follows. Levinthal proposed a famous paradox (Levinthal, 1969, 1968; Zwanzig et al.,
1992): how can a protein find a native state without a globally
exhaustive search? Wetlaufer (1973) pointed out that proteins fold
much too fast (by at least tens of orders of magnitude) to involve an
exhaustive search. The methods for protein folding include nuclear
magnetic resonance, fast kinetics, etc. and experimental results
show that there exits “cooperativity” in protein folding (Creighton,
1978; Kim and Baldwin, 1990). Anfinsen (1973) promoted that a
protein in its natural environment folds into, or vibrates around
a unique dimensional structure, the natural conformation. It indicates that a protein structure is decided by its amino acid sequence
and the structure can be predicted by its amino acid sequence
alone. Finding out the lowest energy tertiary structure of a protein
from its amino acid sequence becomes the main task of the protein
structure prediction (PSP) and this problem has been recognized to
be “NP-hard” (Crescenzi et al., 1998). Recently, James and Twafik
(2003) revisited the conformational diversity of proteins and outlined a hypothesis based on the “new view” of proteins whereby
one sequence can adopt multiple structures and functions.
Proteins display complicated structures, which are categorized
into four levels. Prediction of the quaternary and tertiary struc-
∗ Corresponding author. Tel.: +86 139 2887 8165; fax: +86 20 8521 5536.
E-mail address: xbyang@scnu.edu.cn (X. Yang).
1476-9271/$ – see front matter © 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compbiolchem.2010.04.002
tures is based on secondary structures (SSs). Using SSs to predict
structures and functions of proteins becomes a vital problem. The
feasibility of cutting protein sequences into fragments, constructing fragment databases, assigning the structures to these fragments
and assembling the substructures in order to predict the structure
of proteins has been demonstrated in the literature with various methods and fragment sizes (Gilad et al., 2006; Rohl et al.,
2004; Ruczinski et al., 2002; Skolnick et al., 2000, 2003; Zhang
and Skolnick, 2004, 2005). In this paper we focus on the twodimensional hydrophobic–hydrophilic (2D HP) model (Lau and Dill,
1989), which is a widely studied abstract one and has been used by
chemists to evaluate new hypothesis of protein structure formation. This is a free energy model, where one assumes that the major
contribution to the free energy of the natural conformation of a protein is due to the interactions between hydrophobic amino acids
that tend to form a core in the spatial structure and hydrophilic
amino acids shield the core from surrounding solvents.
In order to search for the ground state conformation (GSC) people have presented molecular dynamical method (Levitt, 1983),
statistical mechanical model (Alm and Baker, 1999), and some
probabilistic search algorithms for 2D HP model, etc. For the latter algorithms, the well-known models include Monte Carlo (MC)
method (Unger and Moult, 1993), evolutionary Monte Carlo (EMC)
model (Liang and Wong, 2001), simulated annealing and genetic
algorithm (GA) (Unger and Moult, 1993; Custódio et al., 2004; Jiang
et al., 2003; Cox and Johnston, 2006), the hybrid of GA and tabu
search (GTS) (Jiang et al., 2003), and elastic net algorithm and local
search method (ENLS) (Guo et al., 2006). In order to check the
effectiveness of these algorithms many protein sequences of lat-
138
C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142
Fig. 1. The conformation of the 2D HP model for the 20-mer protein with the
sequence of (HPHPPHHPHPPHPHHPPHPH), where the free energy for the protein
sequence is E = −9 and the symbols “” and “” denote hydrophilic and hydrophobic amino acids, respectively.
tice models have been calculated. It is found that the speeds of
these optimized algorithms are not very fast and for long protein
chains, sometimes the optimal GSC cannot be obtained.
In this paper we propose a so-called GA based on optimal SS
(GAOSS) and use it to calculate nine benchmarks. The results show
that GAOSS can not only accelerate the computing speed of searching protein GSCs but also enlarge the possibility of finding more
protein GSCs. GAOSS would be an efficient tool for the PSP of the
2D HP model.
This paper is organized as follows. Section 2 is devoted to introduce the 2D HP model. In Section 3 we describe our GAOSS method
in detail. The results and discussions are presented in Section 4.
And a brief summary is given in Section 5.
2. 2D HP model
It is well-known that hydrophobicity is one of the key factors to
determine the folding of an amino acid chain. In 2D HP model (Lau
and Dill, 1989), 20 kinds of amino acids are divided into two classes
according to their hydrophobicity: H (hydrophobic/nonpolar) and
P (hydrophilic/polar) residues. Hydrophobic amino acids tend to
come together and form a compact core to exclude water. Based on
this abstraction, a protein sequence can be regarded as a string with
binary characters, H and P, and we define S = S1 S2 . . . Sn as a protein
chain with n amino acids, where the character Si (i = 1, 2, . . . , n)
denotes the ith amino acid and will be H (P) when the corresponding
amino acid is a hydrophobic (hydrophilic) residue. On the other
hand, a protein sequence will be arranged as a 2D self-avoiding walk
chain, where adjacent characters in the sequence occupy adjacent
grid points and no grid point in the lattice is occupied by more
than one character. The energy of a protein conformation is defined
as the number of topological contacts between adjacent but not
neighbor hydrophobic amino acids. The free energy between the
ith and jth amino acids is given as follows:
ij =
−1.0
0.0
the pair of H and H residues
.
others
(1)
The free energy for the protein sequence can be obtained as follows:
E=
rij ij ,
(2)
i,j
where the parameter
rij =
1
0
Si andSj are adjacent but not neighbor amino acids
.(3)
others
So the problem of the optimization of protein folding is transformed
into the calculation of the minimal free energy of the protein folding conformation. As an example of the 2D HP model, we show the
conformation of the 20-mer protein with the sequence of (HPHPPHHPHPPHPHHPPHPH) in Fig. 1, where the free energy for the
protein sequence is E = −9.
Fig. 2. The main flowchart of GA.
3. GAOSS method
GA was proposed by Holland (1975) and Unger and Moult (1993)
applied it to the PSP. In principle, GA imitates the process of the biological evolution. Firstly, some individuals are generated randomly.
For the 2D HP model of the protein folding, an individual represents a protein chain arranged following the demanded sequence
but each individual possesses different 2D conformation. During
the calculation, the population (the number of the protein chain)
is fixed. After the fitness is defined (in this paper we set the free
energy of the protein chain to be the fitness), all of the individuals will be evaluated by the fitness (in the 2D HP model, of course,
the smaller the fitness is, the better the individual will be, and the
greater the possibilities to survive will be). Then the operators of
selection, reproduction, crossover, and mutation are used for generating the individuals in the next generation. After enough iterations
(e.g., the smallest fitness keeps constant after many iterations), an
optimal solution might be obtained. The main flowchart of GA is
shown in Fig. 2.
GA makes a good performance in the PSP of the 2D HP model
(Unger and Moult, 1993). For the seeking of the optimal GSC of
the proteins with short sequences, the efficiencies of GA and other
modified GA methods (Unger and Moult, 1993; Jiang et al., 2003;
Cox and Johnston, 2006; Guo et al., 2006; König and Dandekar,
1999) are very high, but with the increment of the protein sequence
length, the number of conformations increases so fast that these
blind search methods are infeasible. Taking account of three kinds
of real protein SSs (Linderstrø m-Lang and Schellman, 1959), Liang
and Wong (2001) proposed a more efficient method, EMC, to simulate the protein folding of the 2D HP model. Each kind of SS in
reference (Liang and Wong, 2001) is composed of 10 consecutive
hydrophobic amino acids and they belong to ␤-sheet and two kinds
of ␣-helix (with different directions), respectively. In fact, for real
C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142
Fig. 3. SSs used in GAOSS. (a) ␤-Sheet, (b) ␤-turn, and (c) ␣-helix.
proteins, the types of SS are ␤-sheet, ␣-helix, and ␤-turn, respectively. So the ␤-turn SS should be taken into account. On the other
hand, the shorter the SS is, the more flexible the protein conformation will be, and then the larger the possibility of seeking optimal
GSC will be. Considering these two reasons, we propose a so-called
GAOSS.
In our GAOSS three kinds of new SSs are chosen, which are usually composed of 6 mers and are illustrated in Fig. 3. The 6-mer
residue is the shortest chain to construct a ␣-helix structure. However, sometimes the lengths of these SSs should be chosen suitably,
otherwise the efficiency of seeking the optimal GSC will be lower.
We select k-mer as the length of SSs, which is relating to the average length of all consecutive hydrophobic subsequences (CHSs) and
can be defined as follows:
k = d̄ =
1
di
M
M
,
(4)
i=1
where “” represents the greatest integer function, d̄ is the average
length of all CHS, M is the number of CHSs, and di is the length of
the ith CHS. These CHSs are regarded as regular SSs and will be
configured automatically by recognition operator. Each kind of SS
in reference (Liang and Wong, 2001) is not permitted to mutate
during the evolution, but the three kinds of SSs in our GAOSS will
mutate as follows:
⎧
⎨ ␤-sheet → ␤-turn
⎩
or ␣-helix
␤-turn
→
␣-helix
or ␤-sheet ,
␣-helix
→
␤-sheet
or ␤-turn
(5)
139
n − 2 residues, we use a direction vector to generate their position coordinates, where 0, 1, and 2 denote the forward, left, and
right directions, respectively, and the direction vector elements
construct the set of {0, 1, 2}. Obviously, in the process of generating initial individuals, if more than one residue occupies a same
point in the lattice, then this protein conformation is not permitted, we call this phenomenon non-self-avoiding walk. In order to
eliminate such kind of invalid protein conformations, we adopt the
“recoil growth” algorithm proposed by reference (Guo et al., 2006),
which involves growing the chain one residue at a time, checking the validity of the incomplete conformation at each step, and
backtracking when an invalid subconformation is generated.
For a given protein sequence without CHS, i.e., if F = 0, the initial
individuals can be generated randomly by use of the aforementioned direction vector.
For a given protein sequence with CHS, i.e., if F = 1, the SS conformations should be taken into account. If k = 6, the SSs shown in
Fig. 3 are chosen. If k is equal to multiple of 6, the aforementioned
seven kinds of tertiary structures are selected. The rest part of the
CHSs will be generated randomly by use of the aforementioned
direction vector.
3.3. Evaluation
The individuals should be evaluated before the operators of
selection, reproduction, crossover, and mutation. In this paper we
set the free energy defined by Eq. (2) to be the fitness of each individual. The smaller the fitness is, the better the individual will be.
We line up the population by their fitness, the best solution will
be saved and will not be replaced until a better new one in the
next generation is obtained. If the free energy of the best individual
satisfies our expected value or keeps constant during many iterations, or the evolutionary circulations run enough iterations, then
the results will be outputted, otherwise the program goes to the
next step, selection operator.
where “→” means “mutate into”. Furthermore, the head and tail
parts of these SSs are also permitted to mutate partly. The three
kinds of SSs shown in Fig. 3 can construct seven kinds of tertiary
structures: (1) all ␤-sheet, (2) all ␤-turn, (3) all ␣-helix, (4) a mixture of ␤-sheet and ␤-turn, (5) a mixture of ␤-turn and ␣-helix, (6) a
mixture of ␣-helix and ␤-sheet, and (7) a mixture of ␤-sheet, ␤-turn
and ␣-helix. The operators used in our GAOSS are some different
from those used in standard GA and we explain them as follows.
In our GAOSS, the roulette wheel selection is adopted as the
main strategy and the survival probability of the ith individual, Pi ,
is proportional to the absolute value of its free energy. Pi is defined
as follows:
3.1. Recognition
Pi =
For a given protein sequence, the recognition operator is used
for recognizing whether there exist CHS in the protein chain and
we set the following flag variable F to show the result:
F=
1
0
one or more CHSs exist in the protein
.
no CHS exists in the protein
(6)
Meanwhile, we use a two-dimensional matrix R to save the
sequence positions of the CHSs, where the matrix elements R(i, 1)
and R(i, 2) save the head and tail positions of the ith CHS, respectively.
3.2. Generating initial individuals
In the 2D HP model, Cartesian coordinates are used for describing the two-dimensional spatial positions of amino acids. For a
given protein sequence with n residues, the positions of the first
two mers are fixed to be (0, 0) and (1, 0), respectively. For the other
3.4. Selection and reproduction
Ei
,
N
(7)
Ej
j=1
where Ei is defined by Eq. (2) and N is the population size. By means
of Eq. (7), 10–25% of the worst individuals will be deleted. In order
to keep the population size fixed, part of the good individuals, i.e.,
those with smaller fitness, will be reproduced.
3.5. Crossover
Here we use multi-point crossover just as that in reference
(König and Dandekar, 1999). The crossover probability Pc decreases
linearly as the case in reference (Jiang et al., 2003). In order to keep
the integrity of SSs, the crossover operator is forbidden acting on
the SSs and is only permitted acting on the rest part of CHSs and
the residues of non-CHS.
140
C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142
Table 1
Nine benchmarks calculated in this paper.
Length
Protein sequence
20
24
25
36
HPHPPHHPHPPHPHHPPHPH
HHPPHPPHPPHPPHPPHPPHPPHH
PPHPPHHPPPPHHPPPPHHPPPPHH
PPPHHPPHHPPPPPHHHHHHHPPHH
PPPPHHPPHPP
PPHPPHHPPHHPPPPPHHHHHHHHH
HPPPPPPHHPPHHPPHPPHHHHH
HHPHPHPHPHHHHPHPPPHPPPHPP
PPHPPPHPPPHPHHHHPHPHPHPHH
PPHHHPHHHHHHHHPPPHHHHHHHH
HHPHPPPHHHHHHHHHHHHPPPPHH
HHHPHHPHP
HHHHHHHHHHHHPHPHPPHHPPHHP
PHPPHHPPHHPPHPPHHPPHHPPHP
HPHHHHHHHHHHHH
HHHHPPPPHHHHHHHHHHHHPPPPP
PHHHHHHHHHHHHPPPHHHHHHHHH
HHHPPPHHHHHHHHHHHHPPPHPPH
HPPHHPPHPH
48
50
60
64
85
4. Results and discussions
Fig. 4. The main flowchart of GAOSS.
As an example, we show the case of one-point crossover as follows:
(1)
(n)
(1)
(Sa , . . . , Sa )
(c)
(c+1)
(Sa , . . . , Sa , Sb
(n)
, . . . , Sb )
⇒
(1)
(n)
(Sb , . . . , Sb )
.
(8)
(1)
(c)
(c+1)
(n)
(Sb , . . . , Sb , Sa
, . . . , Sa )
Checking the validity of new individuals. After crossover, if more
than one residue occupies a same point in the lattice, the crossover
operator should be acted on the two parents individuals again.
3.6. Mutation
In our algorithm, a m-point mutation with probability Pm is used,
where m chooses a value randomly from the range of 1 to n/2 and
Pm increases linearly as the case in reference (Jiang et al., 2003). For
the protein sequence with F = 1, there exist three cases as follows:
(1) 1 ≤ m < 6, m bits all mutate during the set {0, 1, 2}; (2) 6≤ m ≤
√
√
n, m bits mutate following formula (5); (3) n < m ≤ n/2, m
bits mutate following the way of creating initial population. For the
individual with F = 0, no SS is included, and the mutation strategic
is as that in reference (Liang and Wong, 2001).
Checking the validity of new individuals. After mutation, if more
than one residue occupies a same point in the lattice, the mutation
operator should be acted on the individual again, otherwise the
program goes to the next step, evaluation operator (see Section
3.3). The main flowchart of GAOSS is shown in Fig. 4.
By means of GAOSS we calculate nine benchmarks and compare the results with those obtained by ENLS (Guo et al., 2006), GTS
(Jiang et al., 2003), EMC (Liang and Wong, 2001), and GA (Unger and
Moult, 1993). The nine benchmarks are shown in Table 1 and the
corresponding results obtained by the aforementioned five methods are listed in Table 2. In our GAOSS, we choose Pc = 0.8 and
Pm = 0.1. For the protein sequences of 20-mer, 24-mer, and 25mer, our program runs 100 iterations with the population of 100
individuals. For the 36-mer sequence, the iteration is 100 and the
population size is 200. For the sequences of 48-mer, 60-mer, and
64-mer, the iterations are all 200 and the population sizes are all
400. For the sequences of 85-mer, the iteration and the population
size are 200 and 500, respectively.
From Table 2 one can see that, the results for shorter protein sequences (20-mer, 24-mer, 25-mer, 36-mer, and 50-mer)
obtained by the aforementioned five methods are all the same and
one can not select a best algorithm. But with the increment of the
protein length, the results are different from each other. For 60-mer
sequence, GAOSS and ENLS methods find the GSCs with the lowest free energy, −36. For 64-mer sequence, only GAOSS method
has obtained the GSC with the lowest free energy, −42. For 85-mer
sequence, GAOSS and EMC methods find the GSCs with the lowest
free energy, −52. It shows that, when proteins become longer and
longer, the GSCs are more and more complicated, and of course, the
seeking of GSC would be more and more difficult. The algorithms
with lower efficiency may not be able to find the GSC, or obtain the
GSC by costing much more time. From the results listed in Table 2
one can see that our GAOSS algorithm is a superior method for the
PSP of the 2D HP model. For complicated protein with very long
sequence, the efficiency of GAOSS would be much higher than those
of other algorithms.
Table 2
The lowest free energies of the nine benchmarks obtained by five kinds of methods.
Length
GAOSS
ENLS
GTS
EMC
GA
20
24
25
36
48
50
60
64
85
−9
−9
−8
−14
−23
−21
−36
−42
−52
−9
−9
−8
−14
−23
−21
−36
−39
−9
−9
−8
−14
−23
−21
−35
−39
−9
−9
−8
−14
−23
−21
−35
−39
−52
−9
−9
−8
−14
−22
−21
−34
−37
C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142
141
Fig. 5. Five GSCs for the protein sequence with 24-mer obtained by GAOSS, where the lowest free energy is −9.
Additionally, by means of GAOSS we study the diversity of protein GSC and find that GAOSS method is always able to seek several
GSCs for each kind of protein sequence, sometimes the GSCs are
quite different from each other. It would be useful for the designing of protein molecules. For example, we discuss the diversity of
protein GSC for the 24-mer and 85-mer sequences as follows.
The protein sequence with 24-mer is a widely studied benchmark for the 2D HP model and one of the GSC was shown in
reference (Cox and Johnston, 2006). In this paper we have obtained
5 GSCs shown as Fig. 5. One can see that all of the conformations
possess a hydrophobic core, which in Fig. 5(b)–(d) are compact and
the others are incompact. The conformations in Fig. 5(b) and (e) are
symmetric and the others are asymmetric. In a word, these 5 GACs
are quite different from each other.
The protein sequence with 85-mer is a long benchmark for the
2D HP model and has been rarely investigated. In reference (Liang
and Wong, 2001), although two GSCs with the lowest free energy
−52 were obtained by means of EMC algorithm, only one kind of
hydrophobic core was found, where the SSs are all ␣-helix SS. By
means of our GAOSS method, we also obtain two GSCs with the
lowest free energy −52, which are shown in Fig. 6. From Fig. 6 one
can see that the conformations are quite different from each other,
even the hydrophobic cores are not the same. In Fig. 6(a), there
exist ␤-sheet, ␤-turn, ␣-helix, and their mixture SSs. In Fig. 6(b),
there are mainly ␣-helix SSs. The CHSs in these two GSCs display
several kinds of SSs and this provide rich choices for the designing
of protein folding.
5. Brief summary
In this paper we introduce the 2D HP model and GA method.
Based on EMC algorithm (Liang and Wong, 2001), we propose a
so-called GAOSS method. After analyzing the conformations of real
proteins, we reform the SSs used by EMC method (Liang and Wong,
2001) as follows: (1) ␤-turn SS has been taken into account. (2)
In order to improve the flexibility of SSs, we shorten the length of
SSs in EMC and the three basic SSs in GAOSS are all composed of 6
mers. The 6-mer residue is the shortest chain to construct ␣-helix.
Fig. 6. Two GSCs for the protein sequence with 85-mer obtained by GAOSS, where the lowest free energy is −52.
142
C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142
(3) Not only ␤-sheet, ␤-turn, and ␣-helix, but also their mixture
structures have been used as SSs in GAOSS. It makes the choices of
SSs in GAOSS richer than those in EMC. (4) When the length of a
CHS is larger than that of the basic SS, the edges of the CHS can be
also treated by the crossover and mutation operators. After these
modifications, GAOSS possesses higher efficiency for the seeking of
protein GSC of the 2D HP model, and meanwhile, GAOSS is powerful
for the studies of the diversity of protein GSCs.
By means of GAOSS, nine benchmarks have been calculated and
the results obtained by GAOSS have been compared with other four
kinds of corresponding methods. It shows that, the lowest free energies of the GSCs obtained by GAOSS are never larger than those
obtained by other algorithms. On the other hand, as examples, we
discuss the diversity of protein GSC for the 24-mer and 85-mer
sequences. Several GSCs have been found by GAOSS and some of
the conformations are quite different from each other. It would be
useful for the designing of protein molecules.
Acknowledgments
This work was supported by the National Natural Science Foundation of China, Grant No. 10974061 and the Program for Innovative
Research Team of the Higher Education in Guangdong, Grant No.
06CXTD005.
References
Alm, E., Baker, D., 1999. Prediction of protein-folding mechanisms from free-energy
landscapes derived from native structures. Proc. Natl. Acad. Sci. U.S.A. 96,
11305–11310.
Anfinsen, C.B., 1973. Principles that govern the folding of protein chains. Science
181, 223–230.
Cox, G.A., Johnston, R.L., 2006. Analyzing energy landscapes for folding model proteins. J. Chem. Phys. 124, 204714–204728.
Creighton, T.E., 1978. Experimental studies of protein folding and unfolding. Prog.
Biophys. Mol. Biol. 33 (3), 231–297.
Crescenzi, P., Goldman, D., Papadimitrou, C., Piccolboni, A., Yannakakis, M., 1998. On
the complexity of protein folding. J. Comput. Biol. 5, 423–446.
Custódio, F.L., Barbosa, H.J.C., Dardenne, L.E., 2004. Investigation of the threedimensional lattice HP protein folding model using a genetic algorithm. Genet.
Mol. Biol. 27, 611–615.
Gilad, W., Nurit, H., Haim, J.W., Ruth, N., 2006. A permissive secondary structureguided superposition tool for clustering of protein fragments toward protein
structure prediction via fragment assembly. Bioinformatics 22, 1343–1352.
Guo, Y.Z., Meng, E.M., Wang, Y., 2006. Exploration of two-dimensional hydrophobicpolar lattice model by combining local search with elastic net algorithm. J. Chem.
Phys. 125, 154102–154106.
Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.
James, L.C., Twafik, D.S., 2003. Conformational diversity and protein evolution—a
60-year-old hypothesis revisited. Trends Biochem. Sci. 28, 361–368.
Jiang, T.Z., Hua, Q., Cui, Shi, G.H., Ma, S.D., 2003. Protein folding simulations of the
hydrophilic model by combining tabu search with genetic algorithms. J. Chem.
Phys. 119 (8), 4592–4596.
König, R., Dandekar, T., 1999. Improving genetic algorithms for protein folding simulations by systematic crossover. BioSystems 50, 17–25.
Kim, P.S., Baldwin, R.L., 1990. Intermediates in the folding reactions of small proteins.
Annu. Rev. Biochem. 59, 631–660.
Lau, K.F., Dill, K.A., 1989. A lattice statistical mechanics model of the conformational
and sequence spaces of proteins. Macromolecules 22 (10), 3986–3997.
Levinthal, C., 1968. Are there pathways for protein folding? J. Chim. Phys. 65,
44–45.
Levinthal, C., 1985. In: Debrunner, P., Tsibris, J.C.M., Munck, E. (Eds.), Mossbauer
Spectroscopy in Biological Systems, Proceedings of a Meeting held at Allerton
House. University of Illinois Press, Urbana, pp. 22–24.
Levitt, M., 1983. Protein folding by restrained energy minimization and molecular
dynamics. J. Mol. Biol. 170, 723–764.
Liang, F.M., Wong, W.H., 2001. Evolutionary Monte Carlo for protein folding simulations. J. Chem. Phys. 115 (7), 3374–3380.
Linderstrø m-Lang, K.U., Schellman, J.A., 1959. Protein structure and enzyme activity.
The Enzymes 1, 443–510.
Rohl, C.A., 2004. Protein structure prediction using Rosetta. Methods Enzymol. 383,
66C93.
Ruczinski, I., 2002. Distributions of beta sheets in proteins with application to structure prediction. Proteins 48, 85C97.
Skolnick, J., 2000. Derivation of protein-specific pair potentials based on weak
sequence fragment similarity. Proteins 38, 3C16.
Skolnick, J., 2003. Touchstone: a unified approach to protein structure prediction.
Proteins 53, 469C479.
Unger, R., Moult, J., 1993. Genetic algorithms for protein folding simulations. J. Mol.
Biol. 231 (1), 75–81.
Wetlaufer, D.B., 1973. Nucleation, rapid folding, and globular intrachain regions in
proteins. Proc. Natl. Acad. Sci. U.S.A. 70, 697–701.
Zhang, Y., Skolnick, J., 2004. Automated structure prediction of weakly homologous
proteins on a genomic scale. Proc. Natl. Acad. Sci. U.S.A. 101, 7594C7599.
Zhang, Y., Skolnick, J., 2005. The protein structure prediction problem could be solved
using the current PDB library. Proc. Natl. Acad. Sci. U.S.A. 102, 1029C1034.
Zwanzig, R., Szabo, A., Bagchi, B., 1992. Levinthal’s paradox. Proc. Nail. Acad. Sci.
U.S.A. 89, 20–22.
Download