We would like to thank the editor and reviewers for their helpful remarks. Below, we reply point by point to each reviewer. Recommendation: Author Should Prepare A Minor Revision Comments: The paper is by and large an integration of two previous papers by the authors that still manage to include some extensions to improve their previous results. The paper is very well written and with a pleasant reading. Despite being mostly a paper describing an experimental approach to address a problem, both the problem and the solutions proposed are quite well presented so that it should be possible to reproduce results, an important issue in papers of this kind. I suggest the paper to be accepted but the final version should take into account the comments below: a) In the first paragraph it is stated that proteins have a single native conformation that minimises some energy function. However this is not always the case. A boiled egg (boiling denatures albumin), does not return to its initial conformation when cooled down, which shows that there might be more than one conformation that is stable (the energy function has more than one local optima). This is a good point. We have modified the text to state that Anfinsen’s experiment is interpreted to mean that for SOME proteins, the native state has minimum free energy. Additionally, we have added a footnote with references, stating that there are known exceptions to this statement, as exemplified by co-translationally folded proteins such as tailspike (kinetics), as well as proteins have multiple metastable states (prions). b) The sentence in the 2nd paragraph of the introduction (starting on in line 35 of the first page) “solution of this problem ...” is too long. Please rephrase it. Indeed, this is the case, and the sentence has been replaced by 2 sentences: “medicine and the pharmaceutical industry, since successful tertiary structure prediction, given only the amino acid sequence information, would allow the computational screening of potential drug targets. In particular, there is currently much research on computational docking of drugs (small chemical ligands) to a protein surface (such as a Gcoupled protein receptor, the most common drug target).” c) Footnotes 4 and 6 in page 3 are redundant! The second footnote is removed. d) The conditions under which the experimental results were obtained should be betters explained. The use of a 60 processor double core processors during 10/30 minutes is it equivalent to a single processor running for 60*2*10/30 = 120/360 minutes (ie. 2 or 6 hours)? Please clarify. These conditions have been better explained, and many additional experiments have been carried out. The best is to look at the file “diff.pdf” where changes are given in a different color, while the final revision is in the file “new.pdf”. In particular, the Brown and Boston College clusters (roughly comparable specs) were used ONLY for batch processing the many experiments to be done in parallel. In each case, every single experiment was performed on a single core of a single processor – no parallelism of the algorithm was done at all. However, since in Table 3, we had to benchmark over 100 sequences, we used the cluster to batch process these on identical processors. Total time per experiment on a single core of a single processor is given in the paper (maximum around 35 minutes). e) The comparison with Will’s results should also be clarified. Will’s does not provide optimal solutions to the F set of benchmarks, but comparing Will’s runs of 3 / 5 minutes (180 / 300 secs) with the authors’ runs of a few hours does not seem fair. When we wrote and submitted the paper, Will’s web server http://cpsp.informatik.uni-freiburg.de:8080/StructJSP.jsp was not functioning (it had not been functional over an extended period). During the revision process, Sebastian Will kindly sent us his executable and precomputed H cores. In the meantime, Will’s server is now functional but cannot handle the R,S, F90,F160 sequences due to preset server maximum computation time allowed. We completed Tables 1,2 using Will’s server (Harvard instances) and executable code (R,S,F90,F160 sequences). To produce similar but random instances for further benchmarking, we used our implementation of the AltschulErikson diresidue shuffling algorithm, an algorithm that preserves EXACTLY the same (contiguous) diresidues – not just preserving the same expected diresidue frequency. This additional benchmarking is given in Table 3, for which exact sequences and output, including errors (nonconvergence), are available at http://bioinformatics.bc.edu/clotelab/ We believe that the value in our method is that it provides useful approximate solutions in instances where Will’s method fails (either due to lack of a precomputed hydrophobic core, or to inability of the algorithm to successfully available cores, or due to time limitation). There is value in both methods, which are in a sense complementary. We have tried in the paper to explain the conceptual differences between Will’s approach and our method. Since hydrophobic cores are pre-computed and stored, Will’s program is capable (when it converges) of determining the EXACT number of H-H contacts; i.e. provably there are no more contacts than that determined by his program. However, as Will shows in his PhD thesis (page 129), his program converges on around 50% of random HP sequences up to a certain length. In contrast, our method ALWAYS provides an answer. Within 180 seconds, by using local search, our program already computes a decent answer, while with greater computation time, the solution is improved to a near-optimal solution. In summary, we believe that Will’s method and our method are complementary in some sense. When Will’s program converges, it provides the optimal solution, and so of course is to be preferred. However, when his method does not converge, the user would nevertheless like a near-optimal solution. The latter is provided by our program. f) Also the fact that Will’s threading is not suitable in all instances is not clear. For instances of the S and R benchmark sets was Will able to obtain solutions with threading or not. Please state this point clearly! We have modified the caption to Table 1 to state clearly that ALL native energies were computed by using Will’s hydrophobic core threading algorithm. Though his web server does not converge for the S and R sequences, S. Will did perform offline computations of these sequences. He reported the values for the S sequences in his dissertation and for the R sequences in a paper co-authored by Backofen. References for the sequences and the native energies have been provided in the caption to Table 1.: g) Lattice models are a somewhat over-simplification of nature. In fact, the authors mention the CASP competition where these models have been applied initially but no submissions were made with lattice models in recent versions. The initial section claims that lattice models can achieve RMSDs of a few Å, but RMSDs above 6 Å tend to convey very little useful information. Although the paper presents an interesting approach to a challenging optimisation problem it is not clear that solutions to this problem are interesting for biochemists / biologists. You should provide some RMSD results regarding the solutions found and reported (namely for the Harvard instances). The goal of this paper is to introduce a hybrid combinatorial optimization method that yields near-optimal results for an NP-hard problem, using a method that is complementary to that of Will, in the sense that our program can yield results when Will’s program outputs nothing. Although Skolnick’s Lab has indeed used lattice protein threadings, subsequently followed by all-atom methods, such an approach is beyond the scope of this paper. Since the HP problem is degenerate (there are many; i.e. hundreds of thousands to perhaps millions) of “optimal” solutions having the maximum number of HHcontacts, it is not meaningful to find the RMSD between Will’s first solution and ours. (The ordering of Will’s solutions is arbitrary and depends on the order that threadings through H cores is undertaken.) Since our solution of the Harvard instances is optimal, it is among the solutions that Will’s program generates, if one uses the “-allbest” flag in his program. Among the (hundreds of thousands to millions of) “optimal” structures for each Harvard instance, each having the maximum number of H-H contacts, the RMSD can range to as large as 18 to 20 lattice units between two different “optimal” structures. Since a lattice unit is sqrt(2)=1.4142… rather than the standard 3.4 Angstroms, this can correspond to something as large as 48 Angstroms! This situation arises not due to the choice of combinatorial optimization algorithm, but rather to the degenerate HP energy model. Dill introduced the HP model, crude as it is, in order to begin to focus on the optimization methods necessary to develop in real protein structure prediction. The contribution of our paper is to benchmark a novel optimization algorithm on the NP-hard problem for the admittedly degenerate HP model. Additional Questions: 1. Which category describes this manuscript?: Practice / Application / Case Study / Experience Report 2. How relevant is this manuscript to the readers of this periodical? Please explain your rating under Public Comments below. : Relevant 1. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The paper addresses the problem of protein structure determination with a face cube centred lattice model. The paper mostly integrates two previous contributions of the authors (references 14 and 15) although it proposes some adaptations of the two papers (local search and large neighbourhood search) that despite being minor do seem to improve the previously obtained results. The integration of local search and constraint programming is an important topic to address combinatorial problems and the paper is an interesting instance of such integration. 2. Is the manuscript technically sound? Please explain your answer under Public Comments below. : Yes 1. Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: References are sufficient and appropriate 3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Yes 4. How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below. : Satisfactory 5. Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Easy to read Please rate the manuscript. Please explain your answer.: Good Thank you for the comments. Reviewer: 2 Recommendation: Author Should Prepare A Major Revision For A Second Review Comments: My main concern is related to the description of the experimental results, and the comparison to existing approaches. First, neither sequences nor run-times are provided for Table1/2. Second, it looks like that you allow different run-times for the comparison (hope that I'm misunderstanding you here). For Will et. al., you report the following: "we show results on instances for which Will's approach did not yield any solution within the given time limits (180 secs for sequences with 90 AS, 300 secs for seqs. with 160 AS)". This indicates that you allow only for 5 minutes for the Will et al. approach. On the other hand, looking at figure 7, you allow for more than 2 hours for your approach. After 5 minutes, you would only get a conformation with approx 320 contacts (instead of 360 for the optimal conformation). When we submitted the paper, we did not have access to Will’s code and his web server was not functional. Hence all we could do is report timings from his PhD thesis; i.e. Will chose the times, we did not. In revising the paper, we contacted Sebastian Will, who kindly sent us the executables of his code, thus allowing us to complete the Tables 1,2 and to create the new Table 3. Run times in Table 3 are bounded by 30 minutes. Complete sequence data is available at the web site http://bioinformatics.bc.edu/clotelab/FCCproteinStructure/ which is cited in the paper. You cannot base your comparison on different timings. Either allow the same for the Will et al. approach, or restrict you approach to the best conformation found within 5 minutes. We have redone the benchmarks, now that we have Will’s program. Run time bounds for Will’s program and our program are now set to the same value. Third, I do not understand your argument that "a fair comparison of the algorithms is not possible at this stage, since only the above 7 sequences are available". Then, in the next sentence, you report that you apply your approach on sequences for which Will's approach did not yield any solution within the given time limits. Where did these sequences come from? Did you apply the Will et al approach? If so, why didn't you let the algorithm run longer to produce sequences for comparison? This should be possible, since the reference [55] also reports a failure rate of only 8% for sequences of length 135, which should be enough for comparison. As previously explained, when this sentence was written, we did not have access to Will’s program and his web server did not function – so at the time the paper was first submitted, we could not perform benchmarking, but only cite results from Will’s thesis. In light of new benchmarking done in the revision, we have removed this sentence “a fair comparison … not possible” and reworded the paragraph containing that sentence. Subsequently, Will explained that for length 90 examples (F90 in the paper), he fixed the number of hydrophobic residues to be 50, with 40 polar residues. This given, the sequence was generated by the hypergeometric distribution. In order to produce more examples that are as similar as possible to those of Will, we took his F90 and F180 sequences, and for each, we produced 10 random sequences having the same dinucleotides using our implementation of the Altschul-Erikson algorithm. This algorithm not only preserves the same mononucleotides (50 H’s and 40 P’s in the F90 case), but also preserves exactly the same dinucleotides; i.e. if a given F90 sequence had 24 dinucleotides of the form HP, then so does each randomization (i.e. not simply the same expected dinucleotide frequency, which latter could easily be generated by a 1-st order Markov chain). Another manner of producing additional instances was to concatenate a given F90 sequence with itself, thus containing 100 H’s and 80 P’s, then subsequently to produce 10 randomizations using the Altschul-Erikson algorithm just explained. Surprisingly, for these F90doubled sequences we had a failure rate of 78% for runs timed at 30 minutes using Will’s algorithm, whereas Will’s F180 sequences of the same size had 50% failure rate. Clearly the performance of Will’s threading algorithm depends on the number of size k runs of H’s. Since this concerns Will’s algorithm, rather than our own work, we did not further analyze this dependence. We should here make an important observation. In theory, the CHCC algorithm of Yue and Dill will compute the optimal answer given sufficient time (perhaps decades or eons). The contribution of Will’s approach is to precompute H-cores that are computed “on the fly” with the method of Yue and Dill, hence speeding up the approach of Yue and Dill considerably. The difficulty with the novel approach of Will is that he produces a set of “suboptimal” cores – however, since there may be very many cores, not all possible cores are in fact determined. Thus, even given infinite time, Will’s method could fail due to the fact that a suboptimal core is not available. So a comparison based on more examples is clearly required. This has been done. Full details of sequences, etc. given at http://bioinformatics.bc.edu/clotelab/FCCproteinStructure. Minor comment: Introduction page 3, last paragraph and page 4, first paragraph: The reference to the CHCC method of Yue and Dill is missing. Reference added, as well as a footnote explaining the idea of Yue and Dill It should also be stated that the work by Backofen and Will transfers the idea of CHCC to the FCC lattice. The statement "By threading an HPsequence onto a hydrophobic cores, the optimal conformation could be found for certain cases. However, if threading is not possible (which is often the case), no solution is returned" is misleading. It sounds like that CHCC and related approaches like the Backofen and Will can be applied only in certain cases, which is not true. Thanks. This is a good point, and we have modified accordingly the text in section 3. The issue is that Will’s program may require exponential time to output an answer (see complexity estimate from Yue and Dill PNAS 1995), while at ANY point of time, our approach outputs an (approximate) answer. Moreover, as explained above, Will precomputes a collection of suboptimal cores – however, since not all possible cores are precomputed, Will’s program can fail to return the optimal structure, even given infinite run time. Furthermore, how do you justify the proposition "which is often the case"? Did you do a throughly comparison? If it is based on fact that finding the optimal solution in 5 minutes fails in 50% for random sequences (as you report on page 9), then this is not a valid statement since an optimal folding could have been found when more time is provided. Since S. Will has sent us his executables, we now have done benchmarking with a time limit of 30 minutes – see Table 3. Yue K, Dill KA. Sequence-structure relationships in proteins and copolymers. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1993 Sep;48(3):2267–2278. Yue K, Dill KA. Folding proteins with a simple energy function and extensive conformational searching. Protein Sci. 1996 Feb;5(2):254–261. Thanks for the references. Additional Questions: 1. Which category describes this manuscript?: Research/Technology 2. How relevant is this manuscript to the readers of this periodical? Please explain your rating under Public Comments below. : Relevant 1. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The authors introduce a new method for lattice folding using local search methods (tabu search), and the combination of the local search techniques with constraint programming techniques. The advantage of the method is that it would be applicable to other energy functions, not only for the HP-model. The disadvantage is that currently, it looks like they do not improve on the HPmodel. 2. Is the manuscript technically sound? Please explain your answer under Public Comments below. : Yes 1. Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: Important references are missing; more references are needed We have added additional references. Please see “diff.pdf” for new references indicated in different color. 3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Could be improved 4. How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below. : Satisfactory 5. Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Easy to read Please rate the manuscript. Please explain your answer.: Good Reviewer: 3 Recommendation: Reject Comments: This paper deals with lattice protein folding. The introduction is well written although I did not see enough emphasis on the real issue of all these works, i.e. to what extent the algorithms for folding lattice proteins are relevant to folding of real proteins. This is a major problem, since the current work (and many similar works) show that on lattice the performance of the various algorithms is reasonable while for real proteins, similar directions of folding ab-initio are not successful at all. We respectfully point out that authors such as Jun Liu (Harvard statistics), Samuel Kou (Harvard statistics), Wing Wong (Stanford statistics), Sebastian Will (computer science, Freiburg), Rolf Backofen (computer science, Freiburg) all have recent publications concerning algorithms to compute optimal self-avoiding walks on 3-dimensional lattices. Less recent work of Sorin Istrail (Brown computer science), William Hart (Sandia National Labs), and many others have considered various aspects of folding in lattice models. Our main point is that we describe a new algorithm, different from Monte Carlo, genetic algorithms, CHCC (hydrophobic core threading), etc. which provides fast approximate solutions. In the context of protein structure prediction, David Baker and Phil Bradley have argued that prediction = search strategy + energy model. Almost universally, the search strategy in protein structure prediction algorithms is some form of Monte Carlo algorithm (possibly with replica exchange, etc. – of course, this is not the case for molecular dynamics). The goal of this article was to benchmark a new search strategy on an NP-complete problem known to be a rough approximation to real protein folding. Anyhow, in the body of the manuscript the authors suggest to combine approaches of Tabu searches, constraints satisfaction and what they call Large Neighborhood Search to solve difficult HP lattice models. All of these approaches have been published before (by the authors and others). The authors were very open about this point, but still my feeling is that the increment presented in this manuscript is not significant enough to be of wide interest. The current manuscript is very similar in many ways to the previous publications. The main algorithm presented here, Fig 5.5 is identical to the main algorithm (Fig 3) in the authors’ CP-08 paper. Even the authors describe some of the changes as “slight modifications” (First paragraph in section 5.5.). The performance of the combined algorithm is somewhat better than the previous version (Table 1 and 2) but the improvement is minor in most cases. Conference proceedings are not final journal versions. It is usual practice in computer science to publish a first conference proceedings article, which is refined for the final journal submission. The present article is the final journal version that includes an algorithmic extension, extensive new benchmarkings, etc. Additionally, the current paper includes a new local search algorithm which is superior to the previously published one, a new greedy initialization, an extension to the previously published LNS algorithm and a new LNS algorithm, along with extensive new benchmarking. Additional Questions: 1. Which category describes this manuscript?: Practice / Application / Case Study / Experience Report 2. How relevant is this manuscript to the readers of this periodical? Please explain your rating under Public Comments below. : Interesting - but not very relevant 1. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The paper suggests combination (with minor modification) of two previous methods published by the authors 2. Is the manuscript technically sound? Please explain your answer under Public Comments below. : Appears to be - but didn't check completely 1. Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: References are sufficient and appropriate 3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Yes 4. How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below. : Satisfactory 5. Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Readable - but requires some effort to understand Please rate the manuscript. Please explain your answer.: Fair