Computational Biology and Chemistry 34 (2010) 137–142 Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem Research article Protein folding simulations of 2D HP model by the genetic algorithm based on optimal secondary structures Chenhua Huang, Xiangbo Yang ∗ , Zhihong He MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, South China Normal University, Zhongshan Road, Guangzhou 510631, China a r t i c l e i n f o Article history: Received 12 February 2009 Received in revised form 20 December 2009 Accepted 27 April 2010 Keywords: Protein folding HP model Genetic algorithm Secondary structure a b s t r a c t In this paper, based on the evolutionary Monte Carlo (EMC) algorithm, we have made four points of ameliorations and propose a so-called genetic algorithm based on optimal secondary structure (GAOSS) method to predict efficiently the protein folding conformations in the two-dimensional hydrophobic–hydrophilic (2D HP) model. Nine benchmarks are tested to verify the effectiveness of the proposed approach and the results show that for the listed benchmarks GAOSS can find the best solutions so far. It means that reasonable, effective and compact secondary structures (SSs) can avoid blind searches and can reduce time consuming significantly. On the other hand, as examples, we discuss the diversity of protein GSC for the 24-mer and 85-mer sequences. Several GSCs have been found by GAOSS and some of the conformations are quite different from each other. It would be useful for the designing of protein molecules. GAOSS would be an efficient tool for the protein structure predictions (PSP). © 2010 Elsevier Ltd. All rights reserved. 1. Introduction Protein folding is an interesting topic and people have paid much attention to it. An incomplete list is given as follows. Levinthal proposed a famous paradox (Levinthal, 1969, 1968; Zwanzig et al., 1992): how can a protein find a native state without a globally exhaustive search? Wetlaufer (1973) pointed out that proteins fold much too fast (by at least tens of orders of magnitude) to involve an exhaustive search. The methods for protein folding include nuclear magnetic resonance, fast kinetics, etc. and experimental results show that there exits “cooperativity” in protein folding (Creighton, 1978; Kim and Baldwin, 1990). Anfinsen (1973) promoted that a protein in its natural environment folds into, or vibrates around a unique dimensional structure, the natural conformation. It indicates that a protein structure is decided by its amino acid sequence and the structure can be predicted by its amino acid sequence alone. Finding out the lowest energy tertiary structure of a protein from its amino acid sequence becomes the main task of the protein structure prediction (PSP) and this problem has been recognized to be “NP-hard” (Crescenzi et al., 1998). Recently, James and Twafik (2003) revisited the conformational diversity of proteins and outlined a hypothesis based on the “new view” of proteins whereby one sequence can adopt multiple structures and functions. Proteins display complicated structures, which are categorized into four levels. Prediction of the quaternary and tertiary struc- ∗ Corresponding author. Tel.: +86 139 2887 8165; fax: +86 20 8521 5536. E-mail address: xbyang@scnu.edu.cn (X. Yang). 1476-9271/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2010.04.002 tures is based on secondary structures (SSs). Using SSs to predict structures and functions of proteins becomes a vital problem. The feasibility of cutting protein sequences into fragments, constructing fragment databases, assigning the structures to these fragments and assembling the substructures in order to predict the structure of proteins has been demonstrated in the literature with various methods and fragment sizes (Gilad et al., 2006; Rohl et al., 2004; Ruczinski et al., 2002; Skolnick et al., 2000, 2003; Zhang and Skolnick, 2004, 2005). In this paper we focus on the twodimensional hydrophobic–hydrophilic (2D HP) model (Lau and Dill, 1989), which is a widely studied abstract one and has been used by chemists to evaluate new hypothesis of protein structure formation. This is a free energy model, where one assumes that the major contribution to the free energy of the natural conformation of a protein is due to the interactions between hydrophobic amino acids that tend to form a core in the spatial structure and hydrophilic amino acids shield the core from surrounding solvents. In order to search for the ground state conformation (GSC) people have presented molecular dynamical method (Levitt, 1983), statistical mechanical model (Alm and Baker, 1999), and some probabilistic search algorithms for 2D HP model, etc. For the latter algorithms, the well-known models include Monte Carlo (MC) method (Unger and Moult, 1993), evolutionary Monte Carlo (EMC) model (Liang and Wong, 2001), simulated annealing and genetic algorithm (GA) (Unger and Moult, 1993; Custódio et al., 2004; Jiang et al., 2003; Cox and Johnston, 2006), the hybrid of GA and tabu search (GTS) (Jiang et al., 2003), and elastic net algorithm and local search method (ENLS) (Guo et al., 2006). In order to check the effectiveness of these algorithms many protein sequences of lat- 138 C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142 Fig. 1. The conformation of the 2D HP model for the 20-mer protein with the sequence of (HPHPPHHPHPPHPHHPPHPH), where the free energy for the protein sequence is E = −9 and the symbols “” and “” denote hydrophilic and hydrophobic amino acids, respectively. tice models have been calculated. It is found that the speeds of these optimized algorithms are not very fast and for long protein chains, sometimes the optimal GSC cannot be obtained. In this paper we propose a so-called GA based on optimal SS (GAOSS) and use it to calculate nine benchmarks. The results show that GAOSS can not only accelerate the computing speed of searching protein GSCs but also enlarge the possibility of finding more protein GSCs. GAOSS would be an efficient tool for the PSP of the 2D HP model. This paper is organized as follows. Section 2 is devoted to introduce the 2D HP model. In Section 3 we describe our GAOSS method in detail. The results and discussions are presented in Section 4. And a brief summary is given in Section 5. 2. 2D HP model It is well-known that hydrophobicity is one of the key factors to determine the folding of an amino acid chain. In 2D HP model (Lau and Dill, 1989), 20 kinds of amino acids are divided into two classes according to their hydrophobicity: H (hydrophobic/nonpolar) and P (hydrophilic/polar) residues. Hydrophobic amino acids tend to come together and form a compact core to exclude water. Based on this abstraction, a protein sequence can be regarded as a string with binary characters, H and P, and we define S = S1 S2 . . . Sn as a protein chain with n amino acids, where the character Si (i = 1, 2, . . . , n) denotes the ith amino acid and will be H (P) when the corresponding amino acid is a hydrophobic (hydrophilic) residue. On the other hand, a protein sequence will be arranged as a 2D self-avoiding walk chain, where adjacent characters in the sequence occupy adjacent grid points and no grid point in the lattice is occupied by more than one character. The energy of a protein conformation is defined as the number of topological contacts between adjacent but not neighbor hydrophobic amino acids. The free energy between the ith and jth amino acids is given as follows: ij = −1.0 0.0 the pair of H and H residues . others (1) The free energy for the protein sequence can be obtained as follows: E= rij ij , (2) i,j where the parameter rij = 1 0 Si andSj are adjacent but not neighbor amino acids .(3) others So the problem of the optimization of protein folding is transformed into the calculation of the minimal free energy of the protein folding conformation. As an example of the 2D HP model, we show the conformation of the 20-mer protein with the sequence of (HPHPPHHPHPPHPHHPPHPH) in Fig. 1, where the free energy for the protein sequence is E = −9. Fig. 2. The main flowchart of GA. 3. GAOSS method GA was proposed by Holland (1975) and Unger and Moult (1993) applied it to the PSP. In principle, GA imitates the process of the biological evolution. Firstly, some individuals are generated randomly. For the 2D HP model of the protein folding, an individual represents a protein chain arranged following the demanded sequence but each individual possesses different 2D conformation. During the calculation, the population (the number of the protein chain) is fixed. After the fitness is defined (in this paper we set the free energy of the protein chain to be the fitness), all of the individuals will be evaluated by the fitness (in the 2D HP model, of course, the smaller the fitness is, the better the individual will be, and the greater the possibilities to survive will be). Then the operators of selection, reproduction, crossover, and mutation are used for generating the individuals in the next generation. After enough iterations (e.g., the smallest fitness keeps constant after many iterations), an optimal solution might be obtained. The main flowchart of GA is shown in Fig. 2. GA makes a good performance in the PSP of the 2D HP model (Unger and Moult, 1993). For the seeking of the optimal GSC of the proteins with short sequences, the efficiencies of GA and other modified GA methods (Unger and Moult, 1993; Jiang et al., 2003; Cox and Johnston, 2006; Guo et al., 2006; König and Dandekar, 1999) are very high, but with the increment of the protein sequence length, the number of conformations increases so fast that these blind search methods are infeasible. Taking account of three kinds of real protein SSs (Linderstrø m-Lang and Schellman, 1959), Liang and Wong (2001) proposed a more efficient method, EMC, to simulate the protein folding of the 2D HP model. Each kind of SS in reference (Liang and Wong, 2001) is composed of 10 consecutive hydrophobic amino acids and they belong to -sheet and two kinds of ␣-helix (with different directions), respectively. In fact, for real C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142 Fig. 3. SSs used in GAOSS. (a) -Sheet, (b) -turn, and (c) ␣-helix. proteins, the types of SS are -sheet, ␣-helix, and -turn, respectively. So the -turn SS should be taken into account. On the other hand, the shorter the SS is, the more flexible the protein conformation will be, and then the larger the possibility of seeking optimal GSC will be. Considering these two reasons, we propose a so-called GAOSS. In our GAOSS three kinds of new SSs are chosen, which are usually composed of 6 mers and are illustrated in Fig. 3. The 6-mer residue is the shortest chain to construct a ␣-helix structure. However, sometimes the lengths of these SSs should be chosen suitably, otherwise the efficiency of seeking the optimal GSC will be lower. We select k-mer as the length of SSs, which is relating to the average length of all consecutive hydrophobic subsequences (CHSs) and can be defined as follows: k = d̄ = 1 di M M , (4) i=1 where “” represents the greatest integer function, d̄ is the average length of all CHS, M is the number of CHSs, and di is the length of the ith CHS. These CHSs are regarded as regular SSs and will be configured automatically by recognition operator. Each kind of SS in reference (Liang and Wong, 2001) is not permitted to mutate during the evolution, but the three kinds of SSs in our GAOSS will mutate as follows: ⎧ ⎨ -sheet → -turn ⎩ or ␣-helix -turn → ␣-helix or -sheet , ␣-helix → -sheet or -turn (5) 139 n − 2 residues, we use a direction vector to generate their position coordinates, where 0, 1, and 2 denote the forward, left, and right directions, respectively, and the direction vector elements construct the set of {0, 1, 2}. Obviously, in the process of generating initial individuals, if more than one residue occupies a same point in the lattice, then this protein conformation is not permitted, we call this phenomenon non-self-avoiding walk. In order to eliminate such kind of invalid protein conformations, we adopt the “recoil growth” algorithm proposed by reference (Guo et al., 2006), which involves growing the chain one residue at a time, checking the validity of the incomplete conformation at each step, and backtracking when an invalid subconformation is generated. For a given protein sequence without CHS, i.e., if F = 0, the initial individuals can be generated randomly by use of the aforementioned direction vector. For a given protein sequence with CHS, i.e., if F = 1, the SS conformations should be taken into account. If k = 6, the SSs shown in Fig. 3 are chosen. If k is equal to multiple of 6, the aforementioned seven kinds of tertiary structures are selected. The rest part of the CHSs will be generated randomly by use of the aforementioned direction vector. 3.3. Evaluation The individuals should be evaluated before the operators of selection, reproduction, crossover, and mutation. In this paper we set the free energy defined by Eq. (2) to be the fitness of each individual. The smaller the fitness is, the better the individual will be. We line up the population by their fitness, the best solution will be saved and will not be replaced until a better new one in the next generation is obtained. If the free energy of the best individual satisfies our expected value or keeps constant during many iterations, or the evolutionary circulations run enough iterations, then the results will be outputted, otherwise the program goes to the next step, selection operator. where “→” means “mutate into”. Furthermore, the head and tail parts of these SSs are also permitted to mutate partly. The three kinds of SSs shown in Fig. 3 can construct seven kinds of tertiary structures: (1) all -sheet, (2) all -turn, (3) all ␣-helix, (4) a mixture of -sheet and -turn, (5) a mixture of -turn and ␣-helix, (6) a mixture of ␣-helix and -sheet, and (7) a mixture of -sheet, -turn and ␣-helix. The operators used in our GAOSS are some different from those used in standard GA and we explain them as follows. In our GAOSS, the roulette wheel selection is adopted as the main strategy and the survival probability of the ith individual, Pi , is proportional to the absolute value of its free energy. Pi is defined as follows: 3.1. Recognition Pi = For a given protein sequence, the recognition operator is used for recognizing whether there exist CHS in the protein chain and we set the following flag variable F to show the result: F= 1 0 one or more CHSs exist in the protein . no CHS exists in the protein (6) Meanwhile, we use a two-dimensional matrix R to save the sequence positions of the CHSs, where the matrix elements R(i, 1) and R(i, 2) save the head and tail positions of the ith CHS, respectively. 3.2. Generating initial individuals In the 2D HP model, Cartesian coordinates are used for describing the two-dimensional spatial positions of amino acids. For a given protein sequence with n residues, the positions of the first two mers are fixed to be (0, 0) and (1, 0), respectively. For the other 3.4. Selection and reproduction Ei , N (7) Ej j=1 where Ei is defined by Eq. (2) and N is the population size. By means of Eq. (7), 10–25% of the worst individuals will be deleted. In order to keep the population size fixed, part of the good individuals, i.e., those with smaller fitness, will be reproduced. 3.5. Crossover Here we use multi-point crossover just as that in reference (König and Dandekar, 1999). The crossover probability Pc decreases linearly as the case in reference (Jiang et al., 2003). In order to keep the integrity of SSs, the crossover operator is forbidden acting on the SSs and is only permitted acting on the rest part of CHSs and the residues of non-CHS. 140 C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142 Table 1 Nine benchmarks calculated in this paper. Length Protein sequence 20 24 25 36 HPHPPHHPHPPHPHHPPHPH HHPPHPPHPPHPPHPPHPPHPPHH PPHPPHHPPPPHHPPPPHHPPPPHH PPPHHPPHHPPPPPHHHHHHHPPHH PPPPHHPPHPP PPHPPHHPPHHPPPPPHHHHHHHHH HPPPPPPHHPPHHPPHPPHHHHH HHPHPHPHPHHHHPHPPPHPPPHPP PPHPPPHPPPHPHHHHPHPHPHPHH PPHHHPHHHHHHHHPPPHHHHHHHH HHPHPPPHHHHHHHHHHHHPPPPHH HHHPHHPHP HHHHHHHHHHHHPHPHPPHHPPHHP PHPPHHPPHHPPHPPHHPPHHPPHP HPHHHHHHHHHHHH HHHHPPPPHHHHHHHHHHHHPPPPP PHHHHHHHHHHHHPPPHHHHHHHHH HHHPPPHHHHHHHHHHHHPPPHPPH HPPHHPPHPH 48 50 60 64 85 4. Results and discussions Fig. 4. The main flowchart of GAOSS. As an example, we show the case of one-point crossover as follows: (1) (n) (1) (Sa , . . . , Sa ) (c) (c+1) (Sa , . . . , Sa , Sb (n) , . . . , Sb ) ⇒ (1) (n) (Sb , . . . , Sb ) . (8) (1) (c) (c+1) (n) (Sb , . . . , Sb , Sa , . . . , Sa ) Checking the validity of new individuals. After crossover, if more than one residue occupies a same point in the lattice, the crossover operator should be acted on the two parents individuals again. 3.6. Mutation In our algorithm, a m-point mutation with probability Pm is used, where m chooses a value randomly from the range of 1 to n/2 and Pm increases linearly as the case in reference (Jiang et al., 2003). For the protein sequence with F = 1, there exist three cases as follows: (1) 1 ≤ m < 6, m bits all mutate during the set {0, 1, 2}; (2) 6≤ m ≤ √ √ n, m bits mutate following formula (5); (3) n < m ≤ n/2, m bits mutate following the way of creating initial population. For the individual with F = 0, no SS is included, and the mutation strategic is as that in reference (Liang and Wong, 2001). Checking the validity of new individuals. After mutation, if more than one residue occupies a same point in the lattice, the mutation operator should be acted on the individual again, otherwise the program goes to the next step, evaluation operator (see Section 3.3). The main flowchart of GAOSS is shown in Fig. 4. By means of GAOSS we calculate nine benchmarks and compare the results with those obtained by ENLS (Guo et al., 2006), GTS (Jiang et al., 2003), EMC (Liang and Wong, 2001), and GA (Unger and Moult, 1993). The nine benchmarks are shown in Table 1 and the corresponding results obtained by the aforementioned five methods are listed in Table 2. In our GAOSS, we choose Pc = 0.8 and Pm = 0.1. For the protein sequences of 20-mer, 24-mer, and 25mer, our program runs 100 iterations with the population of 100 individuals. For the 36-mer sequence, the iteration is 100 and the population size is 200. For the sequences of 48-mer, 60-mer, and 64-mer, the iterations are all 200 and the population sizes are all 400. For the sequences of 85-mer, the iteration and the population size are 200 and 500, respectively. From Table 2 one can see that, the results for shorter protein sequences (20-mer, 24-mer, 25-mer, 36-mer, and 50-mer) obtained by the aforementioned five methods are all the same and one can not select a best algorithm. But with the increment of the protein length, the results are different from each other. For 60-mer sequence, GAOSS and ENLS methods find the GSCs with the lowest free energy, −36. For 64-mer sequence, only GAOSS method has obtained the GSC with the lowest free energy, −42. For 85-mer sequence, GAOSS and EMC methods find the GSCs with the lowest free energy, −52. It shows that, when proteins become longer and longer, the GSCs are more and more complicated, and of course, the seeking of GSC would be more and more difficult. The algorithms with lower efficiency may not be able to find the GSC, or obtain the GSC by costing much more time. From the results listed in Table 2 one can see that our GAOSS algorithm is a superior method for the PSP of the 2D HP model. For complicated protein with very long sequence, the efficiency of GAOSS would be much higher than those of other algorithms. Table 2 The lowest free energies of the nine benchmarks obtained by five kinds of methods. Length GAOSS ENLS GTS EMC GA 20 24 25 36 48 50 60 64 85 −9 −9 −8 −14 −23 −21 −36 −42 −52 −9 −9 −8 −14 −23 −21 −36 −39 −9 −9 −8 −14 −23 −21 −35 −39 −9 −9 −8 −14 −23 −21 −35 −39 −52 −9 −9 −8 −14 −22 −21 −34 −37 C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142 141 Fig. 5. Five GSCs for the protein sequence with 24-mer obtained by GAOSS, where the lowest free energy is −9. Additionally, by means of GAOSS we study the diversity of protein GSC and find that GAOSS method is always able to seek several GSCs for each kind of protein sequence, sometimes the GSCs are quite different from each other. It would be useful for the designing of protein molecules. For example, we discuss the diversity of protein GSC for the 24-mer and 85-mer sequences as follows. The protein sequence with 24-mer is a widely studied benchmark for the 2D HP model and one of the GSC was shown in reference (Cox and Johnston, 2006). In this paper we have obtained 5 GSCs shown as Fig. 5. One can see that all of the conformations possess a hydrophobic core, which in Fig. 5(b)–(d) are compact and the others are incompact. The conformations in Fig. 5(b) and (e) are symmetric and the others are asymmetric. In a word, these 5 GACs are quite different from each other. The protein sequence with 85-mer is a long benchmark for the 2D HP model and has been rarely investigated. In reference (Liang and Wong, 2001), although two GSCs with the lowest free energy −52 were obtained by means of EMC algorithm, only one kind of hydrophobic core was found, where the SSs are all ␣-helix SS. By means of our GAOSS method, we also obtain two GSCs with the lowest free energy −52, which are shown in Fig. 6. From Fig. 6 one can see that the conformations are quite different from each other, even the hydrophobic cores are not the same. In Fig. 6(a), there exist -sheet, -turn, ␣-helix, and their mixture SSs. In Fig. 6(b), there are mainly ␣-helix SSs. The CHSs in these two GSCs display several kinds of SSs and this provide rich choices for the designing of protein folding. 5. Brief summary In this paper we introduce the 2D HP model and GA method. Based on EMC algorithm (Liang and Wong, 2001), we propose a so-called GAOSS method. After analyzing the conformations of real proteins, we reform the SSs used by EMC method (Liang and Wong, 2001) as follows: (1) -turn SS has been taken into account. (2) In order to improve the flexibility of SSs, we shorten the length of SSs in EMC and the three basic SSs in GAOSS are all composed of 6 mers. The 6-mer residue is the shortest chain to construct ␣-helix. Fig. 6. Two GSCs for the protein sequence with 85-mer obtained by GAOSS, where the lowest free energy is −52. 142 C. Huang et al. / Computational Biology and Chemistry 34 (2010) 137–142 (3) Not only -sheet, -turn, and ␣-helix, but also their mixture structures have been used as SSs in GAOSS. It makes the choices of SSs in GAOSS richer than those in EMC. (4) When the length of a CHS is larger than that of the basic SS, the edges of the CHS can be also treated by the crossover and mutation operators. After these modifications, GAOSS possesses higher efficiency for the seeking of protein GSC of the 2D HP model, and meanwhile, GAOSS is powerful for the studies of the diversity of protein GSCs. By means of GAOSS, nine benchmarks have been calculated and the results obtained by GAOSS have been compared with other four kinds of corresponding methods. It shows that, the lowest free energies of the GSCs obtained by GAOSS are never larger than those obtained by other algorithms. On the other hand, as examples, we discuss the diversity of protein GSC for the 24-mer and 85-mer sequences. Several GSCs have been found by GAOSS and some of the conformations are quite different from each other. It would be useful for the designing of protein molecules. Acknowledgments This work was supported by the National Natural Science Foundation of China, Grant No. 10974061 and the Program for Innovative Research Team of the Higher Education in Guangdong, Grant No. 06CXTD005. References Alm, E., Baker, D., 1999. Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. Proc. Natl. Acad. Sci. U.S.A. 96, 11305–11310. Anfinsen, C.B., 1973. Principles that govern the folding of protein chains. Science 181, 223–230. Cox, G.A., Johnston, R.L., 2006. Analyzing energy landscapes for folding model proteins. J. Chem. Phys. 124, 204714–204728. Creighton, T.E., 1978. Experimental studies of protein folding and unfolding. Prog. Biophys. Mol. Biol. 33 (3), 231–297. Crescenzi, P., Goldman, D., Papadimitrou, C., Piccolboni, A., Yannakakis, M., 1998. On the complexity of protein folding. J. Comput. Biol. 5, 423–446. Custódio, F.L., Barbosa, H.J.C., Dardenne, L.E., 2004. Investigation of the threedimensional lattice HP protein folding model using a genetic algorithm. Genet. Mol. Biol. 27, 611–615. Gilad, W., Nurit, H., Haim, J.W., Ruth, N., 2006. A permissive secondary structureguided superposition tool for clustering of protein fragments toward protein structure prediction via fragment assembly. Bioinformatics 22, 1343–1352. Guo, Y.Z., Meng, E.M., Wang, Y., 2006. Exploration of two-dimensional hydrophobicpolar lattice model by combining local search with elastic net algorithm. J. Chem. Phys. 125, 154102–154106. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor. James, L.C., Twafik, D.S., 2003. Conformational diversity and protein evolution—a 60-year-old hypothesis revisited. Trends Biochem. Sci. 28, 361–368. Jiang, T.Z., Hua, Q., Cui, Shi, G.H., Ma, S.D., 2003. Protein folding simulations of the hydrophilic model by combining tabu search with genetic algorithms. J. Chem. Phys. 119 (8), 4592–4596. König, R., Dandekar, T., 1999. Improving genetic algorithms for protein folding simulations by systematic crossover. BioSystems 50, 17–25. Kim, P.S., Baldwin, R.L., 1990. Intermediates in the folding reactions of small proteins. Annu. Rev. Biochem. 59, 631–660. Lau, K.F., Dill, K.A., 1989. A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22 (10), 3986–3997. Levinthal, C., 1968. Are there pathways for protein folding? J. Chim. Phys. 65, 44–45. Levinthal, C., 1985. In: Debrunner, P., Tsibris, J.C.M., Munck, E. (Eds.), Mossbauer Spectroscopy in Biological Systems, Proceedings of a Meeting held at Allerton House. University of Illinois Press, Urbana, pp. 22–24. Levitt, M., 1983. Protein folding by restrained energy minimization and molecular dynamics. J. Mol. Biol. 170, 723–764. Liang, F.M., Wong, W.H., 2001. Evolutionary Monte Carlo for protein folding simulations. J. Chem. Phys. 115 (7), 3374–3380. Linderstrø m-Lang, K.U., Schellman, J.A., 1959. Protein structure and enzyme activity. The Enzymes 1, 443–510. Rohl, C.A., 2004. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66C93. Ruczinski, I., 2002. Distributions of beta sheets in proteins with application to structure prediction. Proteins 48, 85C97. Skolnick, J., 2000. Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins 38, 3C16. Skolnick, J., 2003. Touchstone: a unified approach to protein structure prediction. Proteins 53, 469C479. Unger, R., Moult, J., 1993. Genetic algorithms for protein folding simulations. J. Mol. Biol. 231 (1), 75–81. Wetlaufer, D.B., 1973. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc. Natl. Acad. Sci. U.S.A. 70, 697–701. Zhang, Y., Skolnick, J., 2004. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. U.S.A. 101, 7594C7599. Zhang, Y., Skolnick, J., 2005. The protein structure prediction problem could be solved using the current PDB library. Proc. Natl. Acad. Sci. U.S.A. 102, 1029C1034. Zwanzig, R., Szabo, A., Bagchi, B., 1992. Levinthal’s paradox. Proc. Nail. Acad. Sci. U.S.A. 89, 20–22.