تقرير نهائي لمشروع بحث Research project final report / Rapport final du projet de recherche 1 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 مستند إداري Administrative Document -------- Administrative information / المعلومات اإلداري 71-70-11 :المرجع Project Title - )عنوان المشروع (عربي وأجنبي Computational evaluation of protein energy functions ت ق ي يم حا سوب ي ل نماذج طاق ة ال بروت ين Principal Investigator - الباحث الرئيسي العنوان rthsta tahsaN االسم والشهرة Address nmansour@lau.edu.lb العنوان االلكتروني e-mail 01-786456 03-379647 رقم الهاتف Telephone Name & surname Lebanese American University المؤسسة Institution rooommorPooProsseforP mrnoiro الوظيفة Post Co-investigators - الباحثون المشاركون العنوان االلكتروني e-mail mirvat.elsibai@lau.edu.lb المؤسسة Institution Lebanese American University االسم والشهرة Name and surname Mirvat Sibai 2 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 .1 Duration and starting date of the research / المدة التعاقدي للمشروع وتاريخ بدء البحث PPfwoPsoarmP Duration (year) / المدة التعاقدية للمشروع rrfotorP4100 Starting date of the research /وتاريخ بدء البحث Scientific Information / العلمي المعلومات.2 ّ Objectives - الهدف (mandatory field to fill 5-8 lines) – ) أسطر8-5 : ( معلومات إلزامية The objective is to study the energy functions commonly used in the computational protein structure prediction methods in order to improve the capability of these functions, leading to better protein structures. More specifically, the objectives are: to determine which energy function is more appropriate for ab initio computational methods; whether different energy functions suit different protein types and sizes; and how the energy functions may be amended to provide better guidance to ab initio computational algorithms. Achievements - أالنجازات المحقق (mandatory field to fill 5-8 lines) – ) أسطر8-5 : ( معلومات إلزامية Our results show that the CHARMM energy model tends to yield better protein structures than the other two energy functions, next is AMBER without polarization, followed by LINUS and the polarization version of AMBER. We have also found that it is more favorable for computational protein prediction methods to assign unequal weights to the terms in the energy function. As a result, we recom-mend the following weight values for the CHARMM energy function: 0.1-0.3 for the electrostatic term, 0.1-0.2 for the Vander Waals term, and 0.6-0.8 for the Torsion term, when applied to small to medium size proteins. Perspectives - آفاق البحث (mandatory field to fill 5-8 lines) – ) أسطر8-5 : ( معلومات إلزامية We have carried out a computational assessment of the ability of three energy models for predicting the tertiary structure of proteins. We have also investigated the merit of assigning unequal weights to the terms included in energy functions employed for ab initio protein structure prediction (PSP). Our experimental results have validated our hypotheses. They showed that energy functions have different capabilities in guiding the PSP algorithms and that the energy terms should not have equal coefficient values. In fact, our work did yield some achievements, as mentioned above. However, we have also found that further experimental work, perhaps accompanied with some adjustments to the approach, still needs to be carried out in order to reach more solid conclusions. The 3 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 volume of the work requires more time to complete it since it involves heavy programming and very long execution times. Publications & Communications - المنشورات والمساهمات في المؤتمرات Mansour, N., Mohsen, H.: Computational Evaluation of Protein Energy Functions. Proceedings of the International Conference on Intelligent Computing; in Intelligent Computing in Bioinformatics, 288-299, Springer (2014). Abstract - موجز عن نتائج البحث (mandatory field to fill 5-8 lines) – ) أسطر8-5 : ( معلومات إلزامية Predicting protein structures is guided by the requirement of minimizing the associated energy value. However, the energy functions proposed so far are still in the exploration phase. Also, assigning equal weights to different terms in energy has not been wellsupported. In this project, we carry out a computational evaluation of putative protein energy functions. Our findings show that the CHARMM energy model tends to be more appropriate for ab initio computational techniques that predict protein structures. Also, we propose an approach based on a simulated annealing algorithm to find a better combination of energy terms, by assigning different weights to the terms, for the purpose of improving the capability of the computational prediction methods. P توقيع الباحث 4 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 Final report / Rapport Final Warning / Avertissement 1. The final report must be limited to results directly related to the research project supported by the Council excluding any other activity carried out by the investigator otherwise the report will be rejected. 2. Appendices may be added or attached to the report. 1. Le rapport final doit être limité aux résultats directement liés au projet de recherche soutenu par le Conseil à l'exclusion de toute autre activité menée par le chercheur sous peine de rejet. 2. Des annexes peuvent être ajoutées ou attachées au rapport. 1. Principal investigator / Chercheur principal Name and surname / Nom et prénoms Nashat Mansour Institution of affiliation / Institution d'affiliation Lebanese American University 2. Title of the project as proposed in the original application / Titre du projet tel qu'il a été proposé dans la demande originale (English and French / Anglais et Français) Computational evaluation of protein energy functions 5 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 3. Purpose of the project / Objectifs du projet (1page) Proteins are organic compounds made up of chains of amino acids that fold into complex 3-dimensional structures based on their chemical and physical properties. A protein is characterized by its 3D structure, which defines its biological function. Proteins fold into 3D structures in a way that leads to lowenergy state. Predicting these structures is guided by the requirement of minimizing the energy value associated with the protein structure. However, the energy functions proposed so far by biophysicists and biochemists are still in the exploration phase and their usefulness has been demonstrated only individually. Also, assigning equal weights to different terms in energy has not been well-supported. In this project, we carry out a computational evaluation of putative protein energy functions. We investigate the energy functions commonly used in the computational ab initio protein structure prediction (PSP) methods. The objectives are: to determine which energy function is more appropriate for ab initio PSP methods and how the energy functions may be amended to provide better guidance to these PSP algorithms. In particular, we explore the possibility of finding ‘optimal’, different weights to be assigned to the terms of the energy functions. The ultimate, strategic goal is to discover misfolded proteins and hence the reason for some relevant diseases in order to guide treatment. 6 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 4. Expected outputs / Résultats attendus 1 page max / 1 page au maximum The expected / achieved outputs are the following : 1. Finding one (or more) energy model that is more appropriate to guide the ab initio PSP algorithm in evolving better tertiary protein structures. The main metric used for this indentification is the Root mean square deviation (RMSD) value. 2. Developing suboptimal, unequal weights that can be assigned to the terms of the selected energy function and, thus, lead to improved guidance of the ab initio PSP algorithms. 7 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 5. Résultats obtenus / Obtained results 5 to 10 pages / 5 à 10 pages Appendices can be added a the end of this document / Des annexes peuvent être ajoutées à la fin de ce document Introduction Proteins are organic compounds that are made up of combinations of amino acids and are of different types and roles in living organisms. Initially a protein is a linear chain of amino acids, ranging from a few tens up to thousands of amino acids. Proteins fold, under the influence of several chemical and physical factors, into their 3-dimensional structures which determine their biological functions and properties. Misfolding occurs when the protein folds into a 3D structure that does not represent its correct native structure, which can lead to many diseases such as Alzheimer, several types of cancer, etc… [1]. Using computational methods for predicting the native structure of a protein from its primary sequence is an important and challenging task especially that this protein structure prediction (PSP) problem is computationally intractable. The protein’s sequence of amino acids defines a unique native fold which normally corresponds to a minimum energy value [2]. In theory, this free energy minimum can be computed from quantum mechanics and, thus, should help in predicting the structure from the sequence. In practice, the theoretical foundation of such functions has not been fully established and several energy functions have been proposed. The energy function models proposed so far depend on a number of biophysical factors. Their usefulness has been relatively demonstrated by different researchers. But, previous work has also shown that the precision of these energy models is not well-established [3]. Also, no serious comparative evaluation of these energy functions has been reported. Furthermore, limited work has been reported on the relative importance of the terms of the energy function; many decoys per protein, for a number of proteins, were generated from molecular dynamics trajectories and conformational search using the A-TASSER program minimizing the AMBER potential [4], [5]. In this work, we carry out a computational comparison of important energy functions that have appeared in the protein structure prediction literature in association with ab initio algorithms. We also design a simulated annealing algorithm for deriving values that should be used as weights for the energy terms based on the native structure knowledge on existing golden proteins in the ‘protein data bank’. The ultimate goal is to yield better prediction of the tertiary structure of proteins. This paper is organized as follows. Section 2 presents our methodology. Section 3 describes the energy models used for the comparative work. Section 4 explains the algorithm used for optimizing the weights of the energy terms. Section 5 presents the experimental results. Section 6 concludes the paper. Methodology The primary structure of a protein is a linear sequence of amino acids connected together via peptide bonds. Proteins fold due to hydrophobic effect, Vander Waals interactions, electrostatic forces, Hydrogen bonding, etc…. The protein structure prediction (PSP) problem is intractable [6]. Hence, the main computational approaches are heuristics for finding good suboptimal solutions and can be classified as: Homology modeling, threading, and ab initio methods [7]. For the latter methods, the only required input is the amino acid sequence whereas for the first two methods, data of previously predicted protein structures are used. Ab initio methods predict the 3D structure of proteins given their primary sequences without relying on protein databases. The underlying strategy is find the best possible structure based on a chosen energy function. Based on the laws of physics, the most stable structure is the one with the lowest possible energy. We have identified three energy models/force fields as the most recognized models for pure ab initio PSP methods: the CHARMM model [8], the LINUS energy function [9], and AMBER [10]. The different energy functions include different terms and make a variety of assumptions. But, the relative merits of these functions in guiding computational protein structure prediction methods have not been well-studied. In particular, a computational 8 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 investigation of their applicability has not been carried out. We believe that conducting such an investigation will serve the PSP research community by providing guidelines regarding the applicability of the recognized force fields. Our methodology is based on the following steps and activities. We employ our recently developed computational method for PSP, namely the adapted scatter search metaheuristic [3], [11], as the basic platform for analyzing the behavior of the different energy functions. The selected energy functions are simulated and incorporated into the scatter search algorithm to create different versions of the scatter search based program. Then, real-world proteins are selected from a protein databank. The impact of the energy functions will be evaluated by computing the widely used root mean square deviation (RMSD) of the target structure with respect to the reference/golden protein structure. Then, we computationally derive sub-optimal weights for the energy terms in order to further improve the prediction. Then, for the ‘winner’ energy function, we find alternative weights for its terms to replace the commonly-used equal weights. This is done by adapting a simulated annealing algorithm that aims to simultaneously minimize the energy values of several (golden) proteins whose structure is already known. Energy Functions/Models The stability of the three-dimensional structure for protein is determined by the intramolecular interactions and interactions with the external environment. The search for stable conformations of proteins is based on the minimum total energy of interaction. The three recognized energy models, which are selected for our experimental study, are described in the following subsections. CHARMM Energy Model The Chemistry at HARvard Molecular Mechanics (CHARMM) function [12] is based on the dihedral planes representation of proteins that can be defined by the degrees of freedom given by the torsion angles. There are four torsion angles present in each amino acid in a protein: phi φ, psi ψ, omega ω, and chi χ. Phi is the angle between the planes C-N-Cα and N-Cα-C, where N-Cα is the axis of rotation. This angle decides the distance of C-C of two amino acids. The chi angle is between the planes formed by the atoms of the side chains, and side chains can have as many as five chi angles. Fig. 1 shows a segment of a protein backbone. H R1 H Cα φ O O H C N C NH2 ψ R3 Cα Cα N H R2 plane A COOH H plane B Fig. 1. Segment of a protein backbone with planes of bonds N-Cα and Cα-C, plane A and plane B, respectively. The CHARMM energy function is given by E (c ) K b (b bo ) 2 bonds K K ( o ) ( o ) 2 K (1 cos( n )) dihedrals Aij Bij 12 10 rij electrostatic hydrogen rij ij 12 ij 6 4 ij r 12 r 6 van der waals ij ij angles imp improper dihedrals 2 qi q j 4o r rij 9 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 AMBER Energy Model The AMBER99 model is composed of several all-atom force fields. These fields include parameters for bonded potential energy terms (stretching, bending, and torsion) and nonbonded terms (charge and van der Waals). The original version was AMBER94, which was developed to improve on peptide backbone torsion parameters; Kollman and co-workers used RESP charges derived from high-level ab initio calculations to parameterize energies [13]. The subsequent force field, denoted as AMBER99, is intended for use both with and without polarization effects [14]. AMBER99 includes the following terms: Vbounded = å Kr (r - r eq bonds )2 + å Kq (q - qeq ) 2 angles + Vn [1+ cos(nj - g )] dihedrals 2 å ìïé A B ù q q üï Vnonbounded = åíê 12ij - ij6 ú + i j ý ïêë Rij Rij ûú 4pe Rij þï i< j î Vpol = - 1 mi Eio å 2 i The total energy includes the sum of all the potential fields and the polarization potential energy added in AMBER99. In the first equation , three terms represent contribution to the total energy from “bond stretching, bond angle bending, and torsion angle”, in the second one Vnonbonded is the sum of non-bonded energy as “van der Waals and electrostatic energies”. In the third equation, V pol the polarization value is calculated for each pair of point charge and induced point dipole moment as μ i and Ei is the electrostatic field at the ith atom generated by all other point charges qj. LINUS Energy Model LINUS (Local Independently Nucleated Units of Structure) is an ab initio method for simulating the folding of a protein on the basis of simple physical principles. LINUS involves a Metropolis Monte Carlo procedure, represents a protein by including all its heavy atoms (i.e., non-hydrogen atoms). LINUS developed by Srinivasan and Rose in 1995 is based on simple scoring function includes three components: hydrogen bonding, scaled contacts, and backbone torsion. The hydrogen bonding and the hydrophobic contact scores are calculated between pairs of atoms from residues. The contact energy is given by é Contact energy = maximum value x ê1.0 - ë ù ú (s +1.4)2 - s 2 û dij2 - s 2 The scaled contact has both “Repulsive and attractive” terms. The repulsive term is implemented by rejecting conformations when the interatomic distance between any two atoms is less than the sum of their van der Waals radii. All pairwise interactions are evaluated except those involving carbonyl carbons and the attractive term is applied between the side chain pseudo-atoms. The total energy of the LINUS function scoring is given by the negative sum of the three preceding terms: hydrogen bonding, scaled contact, and backbone torsion [9]. Simulated Annealing Algorithm for Energy Terms’ Weights Simulated Annealing (SA) is a metaheuristic that deals with optimization problems with large search space to find near-optimal solutions. In this paper, simulated annealing is employed to assign appropriate weights to different energy terms. SA aims to yield a good solution with a minimized objective function value. Algorithm Steps The outline of the SA algorithm is shown in Fig. 2 10 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 Initialize Solution X and initial Temperature T while (T > Steady State Temperature and iteration < MAX_ITERATIONs) for (NUMBER_OF_PERTURBATIONS) Y = Perturb(X) if (F(Y) < F(X) or e (-delta E/T) > random (0,1)) X=Y // end if // end for Update (T) // end while Fig. 2. Outline of the SA algorithm. Solution Representation Each energy term is given a random weight between 0 and 1, where the sum of weights must be 1 in any solution. The weight given to ES is called , that to VW is called β, and to Torsion energy is . Thus, the obtained total energy for all solution is calculated according to the following formula: Eprotein = .Golden-ESprotein + β.Golden-VWprotein + .Golden-Torsionprotein Initial Solution The initial solution is randomly generated. α, β, and are randomly generated with values between 0 and 1, such that α + β + = 1 and no single value falls below 0.1. Initial Temperature, Maximum Iterations, and Number of Perturbation Initial temperature is chosen as 400,000 and that of steady state is 15. The maximum number of iterations is chosen to be 200 with 3 perturbations in the ‘for’ loop of each iteration. Perturbation Method The implemented perturbation method alters the weights distribution among the energy terms. It either takes a certain percentage (0.1) from a target weight factor and distributes it to the other 2 weight factors or takes this percentage from two factors and adds it to the target third one. The method has 3 decisions to take randomly. The first one is which of the weight factors (α, β, and ) to target (whether to take from or add to). The second decision is whether to take from the chosen factor or give it more energy. The third decision is how to distribute the amount of energy taken/given. For example, the algorithm may randomly first choose α as the target, then randomly choose to take from it 0.1 of its value, and then randomly chooses to give 0.3 of the amount to be taken from α to β and 0.7 to . Objective Function The objective function to be minimized is the following: ∑ (Ei – AvgEi)2, for i=1 to n where n is the number of the selected proteins, Ei (defined in subsection 4.2) is the total energy obtained according to the weights distribution by α, β, and , and AvgEi is the average of the calculated energies, Ei for i=1..n, in the set of proteins according to weight distributions. Effectively, we are minimizing the mean square deviation in the values of Ei over the n proteins as the values of the weights vary. The SA metaheuristic algorithm enforces a lower bound of 0.1 for each of α, β, and in order not to prevent making any of the energy terms negligible. Thus, in any perturbed combination of values, the SA algorithm rejects any perturbation that yields weight values below 0.1. 11 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 Experimental Results In this section, we demonstrate the performance of our proposed methods using the 3 different energy models for generating native-like structures for the backbones of the three target proteins. Experimental procedure for energy functions In our experiments, we use three subject proteins with known structures in PDB. Our reference PDB is the Brookhaven database [15]. The 3 proteins are: 1CRN (CRAMBIN) is Plant seed protein (46 AAs); 1ROP (ROP Protein) is Transcription Regulation (56 AAs); 1UTG (UTEROGLOBIN) is Steroid Binding (70 AAs). We run the scatter search program for predicting protein structures based on each of the three energy models. The results are evaluated by computing the target protein’s structural difference from the reference/golden protein. This is accomplished by calculating the root mean square deviation (RMSD), in Angstrom, of the Cα atoms of the two proteins [16]. Experimental results for energy functions Table 1 gives the RMSD values by Scatter Search using the 3 types of potential energy function. These results show that CHARMM generates the lowest RMSD values for the 3 proteins. Also, the limited experiments reveal that the polarization term added in AMBER99 may not cause an improvement in the predicted protein structure. CHARMM also runs faster than AMBER and has comparable execution time to LINUS. Although the results in Table 1 demonstrate a rather clear tendency in favor of CHARMM, this does not yield a final conclusion since the number of proteins that are used in the experiment is small. Future work should employ many proteins with various sizes and functions in order to establish whether our result may vary with protein size and type. Experimental procedure for energy weights In order to experiment with weights for energy terms, the real energy values have to be obtained. These are the golden energy values obtained from the data retrieved from the Protein Data Bank (PDB) file of each protein. The data are the 3D coordinates of atoms, torsion angles and amino acids. The total energy value of a protein is the sum of the terms of the CHARMM energy function. Two sets of non-homologous proteins have been chosen, where each is made up of 5 different proteins. The first set contains proteins each with less than 100 amino acids (AAs), whereas the second contains proteins with more than 100 AAs in each. We run the SA algorithm on the data obtained from each set of proteins to generate values for the weights of the energy terms. The variation in the weights is also plotted over the SA iterations until convergence. Finally, to give an indication of the validity of the weight results, we rerun the scatter search program of Mansour et al. [11] to predict the structure of a protein, by using the new weights, and compare the resulting RMSD with that of the structure produced by using equal weights for the energy terms. Table 1. Experimental results for SS on 3 energy functions Energy function CHARMM AMBER96 AMBER99 with Polarization LINUS 1CRN (46 AAs, 326 atoms) 1ROP(56 AAs, 420 atoms) 1UTG(70 AAs, 547 atoms) Time (min) 964 1140 3600 RMSD 9.39 11.80 14.39 Time (min) 1059 1320 4200 RMSD 11.52 15.84 16.04 Time (min) 1182 1800 5005 RMSD 13.56 16.21 18.07 960 13.05 1005 15.22 1200 18.36 12 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 Results for energy weights Results of SA for proteins of less than 100 amino acids. For proteins with less than 100 amino acids (1RPO, 1UBQ, 1CRN, 1ROP, and 1UTG), SA converged after 20 iterations to the objective function value of 5.159x1015. The initial randomly generated solution was of objective function value of 2.178x10 17. Fig. 3 shows the variations of , β, and as a function of iterations of the SA. The values of the three weights at convergence are: α = 0.352; β = 0.105; = 0.543. Fig. 3. Variations of , β, and (x10-1) as a function of iterations Comparing the energy values obtained by using these weights with those obtained from the equal weights for the 1ROP protein show that the former ones (using the new weights) are lower. Equal weights (0.333) give the energy value of 1.125x106, whereas the obtained solution (with α, β, and values) gives the value of 426,072. Fig. 4 shows the values of ES, VW and Torsion energy terms as a function of iterations and the corresponding values of , β, and . After running the scatter search code, the obtained 1ROP protein structure has an RMSD with respect to the golden protein of 7 Angstrom in comparison with about 12 Angstrom, obtained with equal weights [11]. Fig. 4. Variations of ES, VW and Torsion Energies for 1ROP (in 10 5 charge units) as a function of iterations Results of SA for proteins of more than 100 amino acids. For proteins with more than 100 amino acids (1AAJ, 1BP2, 1RPG, 1RRO, and 1YCC), SA converged after 43 iterations to the objective function value of 1.457x1012. The initial randomly generated solution was of objective function value of 3.278x10 13. Fig. 5 shows the variations of , β, and as a function of iterations of the SA. The values of the three weights at convergence are: α = 0.104; β = 0.108; = 0.788. Comparing the energy values obtained by using these weights with those obtained from the equal weights for 13 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 the 1AAJ protein shows that the new weights yield lower values. Specifically, equal weights for all energy terms give the value of: 2.305x106. The obtained solution (with different α, β, and ) gives the value of: 742,844. Fig. 6 shows the values of ES, VW and Torsion energy terms as a function of iterations and the corresponding values of , β, and . After running the Scatter Search code, the obtained 1AAJ protein structure has an RMSD with the golden of 5.84 Angstroms, which is a promising result. By inspecting the values of each of the 3 energy terms, it is clear that in the case of equal weights, the VW term used to dominate the energy function and, consequently, the resulting protein structure. The change in the weight values provides fairer shares to the other two terms and, thus, allows them to influence the evolution of the predicted structure. We also note that the relative weights are different for different protein sizes. These results show that assigning equal weights to the energy terms does not yield the best possible protein structure prediction. But, more experiments are required to establish what weights should be assigned for what size-category of proteins. Also, it will be useful to apply our approach to other recognized energy functions. Fig. 5. Variations of , β, and (x10-1) as function of iterations Fig. 6. Variations of ES, VW and Torsion Energies in 1AAJ (in 105 charge units) as a function of iterations Conclusions We have carried out a computational assessment of the ability of three recognized energy models for predicting the tertiary structure of proteins using a pure ab initio algorithm. We have also investigated the merit of assigning unequal weights to the terms included in energy functions employed for ab initio protein structure prediction. Our experimental results show that the CHARMM energy model tends to yield better protein structures than 14 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 the other two energy functions, next is AMBER without polarization, followed by LINUS and the polarization version of AMBER. We have also found that it is more favorable for computational protein prediction methods to assign unequal weights to the terms in the energy function. As a result, we recommend the following weight values for the CHARMM energy function: 0.1-0.3 for the electrostatic term, 0.1-0.2 for the Vander Waals term, and 0.6-0.8 for the Torsion term, when applied to small to medium size proteins. This work has established the merit of the proposed approach. Further work can focus on extending the experimental work to a larger number of protein sets that include proteins with different sizes and functions and on extending to other energy models. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Prusiner, S.B.: Prions. Proceedings of the National Academy of Sciences of the United States of America 95, 13363–13383 (1998) Anfinsen, C.B.: Principles that Govern the Folding of Proteins. Science, 181-187 (1973) Mansour, N., Kehyayan, C., Khachfe, H.: Scatter Search Algorithm for Protein Structure Prediction. Int. J. Bioinformatics Res. Appl. 5, 501-515 (2009) Wroblewska L., Skolnick J.: Can a Physics-Based, All-Atom Potential Find a Protein's Native Structure Among Misfolded Structures? J Comput Chem. 28, 2059-2066 (2007) Wroblewska, L., Jagielska, A., Skolnick, J.: Development of a Physics-Based Force Field for the Scoring and Refinement of Protein Models. Biophysical J. 94, 3227–3240 (2008) Unger, R. Moult, J.: Genetic Algorithms for Protein Folding Simulations. J. Mol. Biol. 231, 75–81 (1993) Sikder, A.R. Zomaya, A.Y.: An Overview of Protein-Folding Techniques: Issues and Perspectives. Int. J. Bioinformatics Res. Appl. 1, 121-143 (2005) Vanommeslaeghe, K., Hatcher, E., Acharya, C., Kundu, S., Zhong, S., Shim, J., Darian, E., Guvench, O., Lopes, P., Vorobyov, I., Mackerell Jr., A. D.: CHARMM General Force Field: A Force Field for Drug-Like Molecules Compatible with the CHARMM All-Atom Additive Biological Force Fields. J. Comput. Chem. 31, 671–90 (2009) Srinivasan, R., Fleming, P.J., Rose, G.D.: Ab Initio Protein Folding Using LINUS. Methods in Enzymology 383, 48–66 (2004) Cornell, W.D., Cieplak, P., Bayly, C.I., Gould, I.R., Merz, K.M., Ferguson, D.M., Spellmeyer, D.C., Fox, T., Caldwell, J.W., Kollman, P.A.: A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 117, 5179-5197 (1995) Mansour, N., Ghalayini, I., Rizk, S., El- Sibai, M.: Evolutionary Algorithm for Predicting All-Atom Protein Structure. Int. Conf. on Bioinformatics and Computational Biology, New Orleans (2011) Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M.: CHARMM: a Program for Macromolecular Energy, Minimization, and Dynamics Calculations. Journal of Computational Chemistry 4, 187–217 (1983) Kollman P.A.: Advances and Continuing Challenges in Achieving Realistic and Predictive Simulations of the Properties of Organic and Biological Molecules. American Chemical Society (1996) Wang, J., Cieplak, P., Kollman, P.A.: How Well Does a Restrained Electrostatic Potential (RESP) Model Perform in Calculating Conformational Energies of Organic and Biological Molecules?. J. Comp. Chem. 21, 1049–1074 (2000) Bernstein F.C., Koetzle T.F., Williams G.J., Meyer E.F. Jr, Brice M.D., Rodgers J.R., Kennard O., Shimanouchi T., Tasumi M.: The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535-542 (1977) Carugo, O., Pongor, S.: A Normalized Root-Mean-Square Distance for Comparing Protein Three-Dimensional Structures. Protein Sci. 10, 1470–1473 (2001) 15 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 6. Summary table of expected and obtained results / Tableau récapitulatif des résultats attendus et des résultats obtenus Expected outputs / Résultats attendus Obtained results / Résultats obtenus 1. Finding one (or more) energy model that is more appropriate to guide the ab initio PSP algorithm in evolving better tertiary protein structures. 1. Our experimental results show that the CHARMM energy model tends to yield bet-ter protein structures than the other two energy functions, next is AMBER without polarization, followed by LINUS and the polarization version of AMBER. 2. Developing suboptimal, unequal 2. We have also found that it is more weights that can be assigned to the favorable for computational protein terms of the selected energy function prediction methods to assign unequal weights to the terms in the energy function. As a result, we recommend the following weight values for the CHARMM energy function: 0.1-0.3 for the electrostatic term, 0.1-0.2 for the Vander Waals term, and 0.6-0.8 for the Tor-sion term, when applied to small to medium size proteins. 7. Possible encountered difficulties / Difficultés éventuelles rencontrées 1. Technical challenge, which resulted from the extensive programming that was required for the experimental work and the very long execution times that were required for running the program even on small size proteins. This challenge prevented us from conducting further experiments, especially on large proteins. 2. Operational challenge due to the difficulty of finding qualified and committed students to assist in the programming and experimentation. Unfortunately, a few students had to contribute to the work which led to project instablity and a lot of management time during the handover periods (from a student to another). 16 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 3. Scientific publications )articles in peer review journals, books, communications, etc …) / Publications scientifiques (articles dans des revues à comité de lecture, livres, communications, etc …) Attach a copy of each publication as it appeared in the journal) / (Joindre une copie de chaque publication telle qu'elle a paru dans la revue) Related publications : Scatter search algorithm for protein structure prediction. International Journal of Bioinformatics Research and Applications, Vol. 5, No. 5, 2009, 501-515. (with C. Kehyayan, H. Khachfe). Evolutionary Algorithm for Predicting All-Atom Protein Structure. Proceedings of the Int. Conference on Bioinformatics and Computational Biology, New Orleans, March 23-25, 2011 (with I. Ghalayini, S. Rizk, M. El- Sibai). Particle swarm optimization approach for protein structure prediction in the 3D HP model. Journal of Interdisciplinary Sciences: Computational Life Sciences, Vol. 4, 1-11, 2012, Springer (with F. Kanj, H. Khachfe) Ab initio Protein Structure Prediction: Methods and Challenges. Book Chapter in Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, M. Elloumi and A. Y. Zomaya (Eds.), Wiley Book Series on Bioinformatics: Computational Techniques and Engineering, IEEE-Wiley, John Wiley & Sons Ltd., New Jersey, USA; 2013 (with J. Abbass and J.-C. Nebel) MMLD based Genetic algorithm for tag SNP selection. Journal of Interdisciplinary Sciences: Computational Life Sciences, Vol. 6, 1-9, 2014 Springer (with A. Mouawad) Computational Evaluation of Protein Energy Functions. Proceedings of the International Conference on Intelligent Computing; in Intelligent Computing in Bioinformatics, 288-299, Springer, 2014 (with H. Mohsen) 8. Oral presentations or posters in national, regional and international conferences / Présentations orales ou affichées à des congrès nationaux, régionaux ou internationaux. (Attach a copy of each presentation as it was presented or published in refereed conference proceedings)/ (Joindre une copie de chaque présentation telle qu'elle a été affichée ou publiée dans les comptes rendus des congrès) Presenting a paper, entitled «Computational Evaluation of Protein Energy Functions » at the 10th International Conference on Intelligent Computing, Taiyuan, August 3-6, 2014. 17 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 How to submit the final report ? Comment soumettre le rapport final ? -------The final report must be submitted to Council in two versions : A hard copy which can be mailed or delivered directly to the Council administrative seat; An electronic version, Word document, on CD-ROM or USB drive or email sent to the Council at the following address : grp@cnrs.edu.lb Le rapport final doit parvenir au Conseil en deux versions : Une version sur papier qui peut être envoyée par la poste ou déposée directement au siège administratif du Conseil ; Une version électronique en format Word sur CD-ROM ou sur clé USB, ou envoyée au Conseil par e-mail à l'adresse suivante : grp@cnrs.edu.lb 18 4102 برنامج دعم البحوث العلمية Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012 برنامج دعم البحوث العلمي في لبنان لعام 2712 صفح 8 -------- .17تقديم التقرير النهائي: .0100في نهاية المشروع (سنة أو سنتين) ،على الباحث تقديم تقرير نهائي (نسخة ورقية ونسخة إلكترونية بصيغة Wordعلى Pقرص مدمج أو USBأو ترسل إلى المجلس بواسطة البريد االلكتروني على العنوان التالي ،grp@cnrs.edu.lb :وذلك Pوفقاً للنموذج المعتمد في المجلس والموجود على موقع المجلس http://www.cnrs.edu.lbمرفقاً بالتصفية المالية لمشروع يبين فيه ما البحث .ال يقبل التقرير النهائي إالّ إذا عرض الباحث بشكل واضح جدوالً مفصالً ّ تم إنجازه مقارن مع تصوره لمخرجات المشروع عند قبوله ،على أن ال يتضمن سوى ما له عالقة مباشرة بمشروع البحث المدعوم من المجلس دون إغراقه بأية تفاصيل أو نشاطات أخرى والتركيز حص اًر على النتائج التي توصل اليها الباحث. .0104يعتمد المجلس في تقييم التقرير النهائي على األهمية العلمية للمقاالت الصادرة عن الباحث وذات العالقة بمشروع البحث المدعوم من المجلس من خالل عدد من المعايير والمؤشرات الدولية نذكر من بينها على سبيل المثال Impact Factor, Citation Index: 19 برنامج دعم البحوث العلمية 4102 Grant Program for Scientific Research in Lebanon – 2012 Programme de subvention à la recherche scientifique au Liban – 2012