Supplementary Material 1: Uniprot: Uniprot (Universal Protein resource) database (http://www.uniprot.org/) provides a free online comprehensive resource for protein sequence which is fully classified and accurately annotated [1]. It is developed by UniProt Consortium which comprises of groups from European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). The database is updated every month and the sequence is submitted in FASTA format. PDBsum: PDBsum is a graphical database (http://www.ebi.ac.uk/pdbsum/) that provides pictorial information in both 2D and 3D format [2]. It also provides other information such as protein chains, ligands, protein-protein interaction diagrams, Number of helices, number of beta, gamma turns, etc. Moreover, it provides wiring diagrams and topology diagrams of the query protein. It also provides information about protein-protein interfaces and residue-residue interactions. It is developed by European Bioinformatics Institute (EBI). More information about the server could be accessed from http://www.ebi.ac.uk/thornton-srv/databases/cgibin/pdbsum/GetPage.pl?pdbcode=n/a&template=doc_about.html. Protparam: Protparam is a web application that calculates physico-chemical properties from amino-acid sequence [3]. The website can be accessed from http://web.expasy.org/protparam/. The various properties are molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY). The protein under investigation can be specified as a accession number or as raw sequence. The documentation of various parameters can be accessed through http://web.expasy.org/protparam/protparam-doc.html. Pfam: Pfam is a comprehensive database (http://pfam.xfam.org/) of proteins domains and families and is developed by The Wellcome Trust Sanger Institute, UK; University of Helsinki, Finland; University of Oxford, UK; Stockholm Bioinformatics Centre, Sweden and Janelia Farm Research Campus, USA [4]. The current information of release notes can be accessed from (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/relnotes.txt). It uses Uniprot as its reference sequence database. Pfam uses various algorithms to provide results such as jackhammer [5] and Hidden Markov Models [6]. Domain and family identification is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the sequences. The data is constantly shared with other databases such as Structural Classification of Proteins (SCOP) [7] and CATH protein structure classification [8]. InterProScan: InterProScan (http://www.ebi.ac.uk/interpro/) is another database which contains broad information about protein domain and families [9]. The results obtained from Pfam were crosschecked and compared to the results of InterProScan. Sequence (amino acid or nucleic acid) submitted to InterProScan are matched against the signatures from several different databases. Sequences are submitted in FASTA format. More information can be accessed from http://www.ebi.ac.uk/interpro/about.html. Supplementary Material 2: I-TASSER: Iterative Threading ASSEmbly Refinement (I-TASSER) (http://zhanglab.ccmb.med.umich.edu/I-TASSER/) is an algorithm for predicting three-dimensional protein structure from amino acid sequences [10]. It identifies structure templates from the Protein Data Bank by fold recognition. The full-length structure models are created by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations [11]. Amino acid sequences are submitted as input by the users. More information about the I-TASSER can be accessed from http://zhanglab.ccmb.med.umich.edu/I- TASSER/about.html. Moreover, I-TASSER server was ranked No. 1 server in Critical Assessment of Techniques for Protein Structure Prediction (CASP) 7, 8, 9 and 10 respectively. I-TASSER also uses LOMETS V3.0 for protein structure prediction and the documentation can be accessed from http://zhanglab.ccmb.med.umich.edu/LOMETS/readme.txt. The best model for energy minimization was chosen based on the maximum C-value score and maximum number of decoys. C-score is a confidence score for estimating the quality of predicted models by I- TASSER. It is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and viceversa. I-TASSER generates full length model of proteins by excising continuous fragments from threading alignments and then reassembling them using replica-exchanged Monte Carlo simulations. A higher cluster density means the structure occurs more often in the simulation trajectory and therefore signifies a better quality model. Discovery Studio: Discovery Studio is client-based-server suite and is developed and distributed by Accelry’s (http://accelrys.com/products/discovery-studio/). It is well known collection of various algorithms used for computational chemistry, computational biology, cheminformatics, molecular simulations and quantum mechanics. It uses many software algorithms such as CHARMM [12], MODELLER [13], DELPHI [14], ZDOCK [15], etc. All the thirty Peptides were prepared using Discovery Studio 3.1 module build and edit protein, in which build action was used to create and grow chains of amino acids as desired. The generated peptides were minimized using CHARMM force field using electrostatics spherical cutoff and the smart minimizer algorithms with maximum steps of 200. Swiss-PDB Viewer: It is a wonderful application that helps to analyze and minimize H-bonds, angles, distances between atoms, etc in proteins and it can be accessed from http://spdbv.vitalit.ch/. The generated three dimensional models from I-TASSER web application were further subjected to energy minimization using the steepest descent technique to eliminate bad contacts between protein atoms. The Swiss-PDB viewer uses GROMOS 43B1 force field [16] which is mainly used to repair distorted geometries by removing internal constrains. Energy minimization preferences were set to 1000 steps of steepest descent technique while the cutoff value was set to 0.500 Å. The delta E cutoff value was maintained at 0.030 kJ/mol and the force acting on any atom was set to a default value of 10.000. Energy minimization module in tools tab was used to start the process. The minimized model was selected for molecular dynamics simulation studies. GROMACS: GROMACS 4.5.4 package [17] and Amber99sb-ILDN force field [18] was implemented to examine the modeled proteins stability. The protein models were solvated with SPC-E water model that extend to 0.9 nm triclinic box from the molecule to the edge of the box. Periodic boundary conditions were applied in all directions and the total charge was adjusted to zero. Maximum of 50,000 energy minimization steps was carried out for the protein models using a steepest descent algorithm with a tolerance of 1000 kJ mol-1 nm-1. Consequently, 50,000 steps of a conjugate gradient algorithm are also used to minimize the protein models with a tolerance of 1000 kJ mol-1 nm-1. The solvated and minimized system were considered a reasonable one in terms of geometry and solvent orientation and used for further simulation steps. All bond angles were controlled with LINCS algorithm [19], while SETTLE algorithm [20] was used to constrain the geometry of the water molecules. Temperature was maintained (300 K) by V-rescale weak coupling method, while the Parrinello-Rahman method [21] was used to preserve the pressure (1 atm) of the system. The position restrains (PR) MD for both NVT (constant number of particles, volume and temperature) and NPT (constant number of particles, pressure and temperature) were carried out for 100 ps. This pre-equilibrated system was later used in the 3000 ps (3 ns) production MDS with a time-step of 2 fs. Structural coordinates were saved every 2 ps and analyzed using the analytical tool in the GROMACS package. The lowest potential energy conformations were selected from 3 ns MDS trajectory for further ProteinProtein Interaction as well as Protein-Peptide Interaction Studies. The refined models were validated using the structural analysis and verification server (SAVES). The above mentioned protocol was also used for molecular dynamics simulation studies of Protein-Peptide-Protein complexes which also proved the stability of the designed peptides. More information can be accessed from http://www.gromacs.org/. SAVES: Structural Analysis and Verification Server (SAVES) is a protein structure validation server (http://nihserver.mbi.ucla.edu/SAVES/). It used many web applications to come to a conclusion such as PROCHECK [22], ERRAT [23] and VERIFY_3D [24]. PROCHECK checks the stereochemical quality of a protein structure and overall structure geometry. ERRAT analyzes the non-bonded interactions between different atom types while VERIFY _3D examines 3D models with its amino acid sequence. The proteins under investigation were submitted with their pdb files for structure validation and verification. The parameters and working of SAVES server can be accessed from http://nihserver.mbi.ucla.edu/SAVES/Info.php. Supplementary Material 3: HADDOCK: HADDOCK is a user friendly and popular web server which has been extensively used for biomolecular docking [25]. It is one of the few docking software platforms that explicitly takes flexibility into account both in the side-chains and backbone of the proteins. It uses both NMR and non-NMR experimental information to guide the docking process. Haddock web server (http://haddock.science.uu.nl/services/HADDOCK/haddock.php) offers different levels of services to users and these could be accessed by a simple registration process. The pdb file of the proteins under consideration was submitted as an input file and the domain region was provided as active site residues. The default parameters by the webserver were used both for proteinprotein and protein-peptide docking and it can be accessed from http://haddock.science.uu.nl/services/HADDOCK/settings.html. LIGPLOT: LIGPLOT is a program which generates schematic diagrams of protein-ligand and protein-protein interactions [26] and it can be accessed from https://www.ebi.ac.uk/thorntonsrv/software/LIGPLOT/. Hydrogen bonds formation between proteins under investigation were analysed using the DIMPLOT module of LIGPLOT. The maximum H-A distance for hydrogen bond formation was kept at 2.70 Å while D-A distance was kept at 3.35 Å, where H=hydrogen, A=acceptor and D=donor respectively. PROPKA: PROPKA webserver (http://propka.ki.ku.dk/) helps to estimate pKa values of amino acids as they exist within proteins. PROPKA 3.1 was used to calculate the was also used to check the stability of the protein-peptide complex [27]. The pdb structure file was provided as input and the results were calculated based on the default values of the server. PISA: Protein Interfaces, Surfaces and Assemblies (PISA) webserver (http://www.ebi.ac.uk/pdbe/pisa/pistart.html) was deployed for salt bridge analysis [28]. The number of salt bridges is then used to assess the likely stability of the interface. PISA considers a distance of 4 Å for a salt bridge to form. The pdb structure file of proteins under consideration was submitted as input. Further details can be found from http://www.ebi.ac.uk/msd- srv/prot_int/pistart.html. European Bioinformatics Institute (EBI) is responsible for maintaining the webserver. DrugScorePPI: DrugscorePPI is a knowledge-based webserver for computational alaninescanning in protein-protein interfaces [29]. It uses QSAR approach with respect to experimental binding free energy differences between wildtype proteins and ALA mutants for protein-protein complex formation. This server automatically scans for the interface residues of given biomolecular complexes and it can be accessed from http://cpclab.uni-duesseldorf.de/dsppi/. Initially it calculates ΔGWT (wild type) and mutates one of the interface residues to alanine then calculates the ΔGMUT (mutant type) which allows succeeding calculation of ΔΔG (change in binding free energy) by subtracting the ΔGWT from the ΔGMUT. This procedure will be continued until ΔΔG of all the interface residues are calculated. The input was provided in the form of pdb file. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. Bairoch, A., et al., The Universal Protein Resource (UniProt). Nucleic Acids Res, 2005. 33(Database issue): p. D154-9. Laskowski, R.A., V.V. Chistyakov, and J.M. Thornton, PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Research, 2005. 33(suppl 1): p. D266-D268. Gasteiger, E., et al., Protein identification and analysis tools on the ExPASy server, in The proteomics protocols handbook. 2005, Springer. p. 571-607. Finn, R.D., et al., Pfam: clans, web tools and services. Nucleic Acids Research, 2006. 34(Database issue): p. D247-51. Johnson, L.S., S.R. Eddy, and E. Portugaly, Hidden Markov model speed heuristic and iterative HMM search procedure. BMC bioinformatics, 2010. 11(1): p. 431. Baum, L.E., et al., A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The annals of mathematical statistics, 1970: p. 164-171. Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of molecular biology, 1995. 247(4): p. 536-540. Pearl, F.M.G., et al., The CATH database: an extended protein family resource for structural and functional genomics. Nucleic acids research, 2003. 31(1): p. 452-455. Quevillon, E., et al., InterProScan: protein domains identifier. Nucleic Acids Res, 2005. 33(Web Server issue): p. W116-20. Roy, A., A. Kucukural, and Y. Zhang, I-TASSER: a unified platform for automated protein structure and function prediction. Nature protocols, 2010. 5(4): p. 725-738. Andrieu, C., et al., An introduction to MCMC for machine learning. Machine learning, 2003. 50(12): p. 5-43. Brooks, B.R., et al., CHARMM: the biomolecular simulation program. Journal of computational chemistry, 2009. 30(10): p. 1545-1614. Eswar, N., et al., Comparative protein structure modeling using Modeller. Current protocols in bioinformatics, 2006: p. 5.6. 1-5.6. 30. Rocchia, W., E. Alexov, and B. Honig, Extending the applicability of the nonlinear PoissonBoltzmann equation: Multiple dielectric constants and multivalent ions. The Journal of Physical Chemistry B, 2001. 105(28): p. 6507-6514. Chen, R., L. Li, and Z. Weng, ZDOCK: An initial‐stage protein‐docking algorithm. Proteins: Structure, Function, and Bioinformatics, 2003. 52(1): p. 80-87. Scott, W.R., et al., The GROMOS biomolecular simulation program package. The Journal of Physical Chemistry A, 1999. 103(19): p. 3596-3607. Pronk, S., et al., GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics, 2013. 29(7): p. 845-854. Lindorff-Larsen, K., et al., Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins-Structure Function and Bioinformatics, 2010. 78(8): p. 1950-1958. Hess, B., et al., LINCS: a linear constraint solver for molecular simulations. Journal of computational chemistry, 1997. 18(12): p. 1463-1472. Miyamoto, S. and P.A. Kollman, SETTLE: an analytical version of the SHAKE and RATTLE algorithm for rigid water models. Journal of computational chemistry, 1992. 13(8): p. 952-962. Martoňák, R., A. Laio, and M. Parrinello, Predicting crystal structures: the Parrinello-Rahman method revisited. Physical review letters, 2003. 90(7): p. 075503. 22. 23. 24. 25. 26. 27. 28. 29. Laskowski, R.A., et al., PROCHECK: a program to check the stereochemical quality of protein structures. Journal of applied crystallography, 1993. 26(2): p. 283-291. Colovos, C. and T.O. Yeates, Verification of protein structures: patterns of nonbonded atomic interactions. Protein Science, 1993. 2(9): p. 1511-1519. Bowie, J.U., R. Luthy, and D. Eisenberg, A method to identify protein sequences that fold into a known three-dimensional structure. Science, 1991. 253(5016): p. 164-170. de Vries, S.J., M. van Dijk, and A.M. Bonvin, The HADDOCK web server for data-driven biomolecular docking. Nature Protocols, 2010. 5(5): p. 883-97. Laskowski, R.A. and M.B. Swindells, LigPlot+: Multiple Ligand-Protein Interaction Diagrams for Drug Discovery. Journal of Chemical Information and Modeling, 2011. 51(10): p. 2778-2786. Li, H., A.D. Robertson, and J.H. Jensen, Very fast empirical prediction and rationalization of protein pKa values. Proteins: Structure, Function, and Bioinformatics, 2005. 61(4): p. 704-721. Krissinel, E. and K. Henrick, Inference of macromolecular assemblies from crystalline state. Journal of molecular biology, 2007. 372(3): p. 774-797. Kruger, D.M. and H. Gohlke, DrugScorePPI webserver: fast and accurate in silico alanine scanning for scoring protein-protein interactions. Nucleic Acids Research, 2010. 38(Web Server issue): p. W480-6. Supplementary Material 4: TABLES: Table 6: Domain function screening for the proteins under consideration Tools in Use Pfam InterProScan Proteins AtPOT1b POT1 domain (13-143) Telo_bind domain (13-143) AtTRB1 Myb Linker DNA Histone binding family (123domain 178) (5-55) SANT/ Histone Myb H1/H5 domain domain (5-55) (123-182) AtTRB2 Myb DNA Histone binding H1/H5 domain domain (5-55) (125-182) AtTRB3 Myb DNA binding domain (5-55) SANT/My b domain (5-55) SANT/ Myb domain (5-55) Histone H1/H5 domain (125-182) Histone H1/H5 domain (122180) Table 7: List of top ten templates used by I TASSER for three dimensional (3D) structure prediction Protein Name Templates POT1b 2i0qA, 1jb7A , 1xjvA, 3kjpA TRB1 2osxA, 2lsoA, 1hstA, 4fsxA, 2juhA, 1h89A, 1hstA, 1h88C, 3hfwA TRB2 4fxgB, 2lsoA, 1hstA, 4fsxA, 2juhA, 1x58A, 1h88C, 4fxgB TRB3 4fxgB, 1hstA, 1zrtD, 2juhA, 1x58A, 1h88C, 4fxgB, 2lsoA Table 8: I-TASSER scores to identify the best model generated Protein Name POT1b TRB1 TRB2 TRB3 I-TASSER C-score No. of decoys Cluster density Models Model 1* -0.31 2089 0.1410 Model 2 -2.20 314 0.0212 Model 3 -1.00 1049 0.0708 Model 4 -3.27 108 0.0073 Model 5 -3.69 71 0.0048 Model 1* -3.24 703 0.0242 Model 2 -3.37 621 0.0214 Model 3 -3.66 461 0.0159 Model 4 -3.98 335 0.0115 Model 5 -4.19 272 0.0094 Model 1* -2.44 2315 0.0533 Model 2 -4.67 251 0.0058 Model 3 -4.73 235 0.0054 Model 4 -4.90 199 0.0046 Model 5 -5.00 175 0.0040 Model 1 * -2.20 2336 0.0696 Model 2 -3.54 611 0.0182 Model 3 -4.69 193 0.0058 Model 4 -4.82 169 0.0050 Model 5 -4.86 162 0.0048 * represents the best models generated by I-TASSER server. The number of decoys ranged from 703 to 2336 as shown in Table 2. Template modelling score (TM-score) was used to find the structural similarity between the models and templates. The TM-score for best model were revealed to be was 0.67±0.13, 0.35±0.12, 0.43±0.14 and 0.45±0.15 for the proteins AtPOT1b, AtTRB1, AtTRB2 and AtTRB3 respectively. The values of decoys are directly proportional to the value of clusters. More the number of decoys, more the density value, which indirectly influenced the stability of structures and less the c-score values, more the decoy values which also supports for choosing the best structure. Based on this logic and principle, the best models were identified and were further taken up for energy minimization. Table 9: ProMotif results for all proteins from PDBsum server Protein Name No.of No. of Sheet Beta Hairpins AtPOT1B 7 7 No. of Psi loop 1 No. of No. of No. of Beta strand helices bulges 5 19 7 No. of No. of Helix-Helix Beta Interaction turns 1 60 No. of Gamma turns 10 AtTRB1 None None None None None 15 27 33 11 AtTRB2 1 1 None None 2 13 18 28 8 AtTRB3 1 1 None 1 2 13 12 31 7 Table 10: Salt Bridge Interactions of the three complexes as detected through PISA AtPOT1b A:ARG 13 A:ARG 282 A:ARG 282 A:GLU 149 A:GLU 149 A:ASP 116 A:ASP 116 A:ASP 116 A:ASP 116 A:GLU 141 A:GLU 141 A:GLU 130 A:ASP 50 A:ASP 16 A:ASP 16 A Dist. (Å) 2.77 3.72 3.03 3.9 2.99 2.71 3.26 3.67 2.62 2.78 2.71 2.59 2.6 2.67 2.76 AtTRB1 AtPOT1b B:ASP 122 B:ASP 149 B:ASP 149 B:ARG 120 B:ARG 120 B:ARG 159 B:ARG 159 B:ARG 159 B:ARG 159 B:LYS 164 A:LYS 10 A:LYS 10 A:ASP 8 A:ASP 8 A:ASP 8 A:ASP 16 A:ASP 16 A:ASP 16 A:ASP 16 A:GLU 141 B:LYS 164 B:LYS 166 B:LYS 173 B:LYS 176 B:LYS 176 B Dist. (Å) 2.88 2.79 3.81 3.77 2.67 2.67 3.56 3.34 2.67 2.6 AtTRB2 AtPOT1b B:GLU 153 B:GLU 254 B:ARG 293 B:ARG 293 B:ARG 293 B:ARG 163 B:ARG 163 B:ARG 163 B:ARG 163 A:ASP 16 A:ASP 16 A:ASP 16 A:ASP 50 A:ASP 50 A:GLU 149 A:GLU 149 A:GLU 149 A:GLU 149 B:LYS 127 C Dist. (Å) 3.64 2.65 3.85 2.65 2.82 2.73 3.55 3.5 2.7 AtTRB3 B:LYS 135 B:ARG 136 B:ARG 136 B:LYS 135 B:LYS 135 B:ARG 161 B:ARG 161 B:ARG 161 B:ARG 161 Table 11: Propensity of important interacting residues between AtPOT1b and AtTRB1-3 AtPOT1b (Total no. of AtTRB1-3 (Total No. of amino acids) amino acids)(Chain A) (Chain B) 6 Arginine 3 Aspartic acid / 1 Glycine/ 1 Threonine/ 1 Lysine 6 Asparagine 2 Arginine/ 1 Asparagine/ 1 Leucine/ 1 Aspartic acid/ 1 Serine 10 Glutamic acid 4 Arginine/ 2 Lysine/ 1 Serine/ 1 Tryptophan/ 1 Tyrosine/ 1 Glutamine 4 Serine 1 Serine/ 1 Aspartic Acid/ 1 Arginine/ 1 Asparagine 2 Tryptophan 1 Threonine/ 1 Aspartic acid 14 Aspartic acid 5 Lysine/ 7 Arginine/ 2 Asparagine Number denotes the number of times the amino acids are involved in making hydrogen bond formation. FIGURE LEGENDS: Figure 8: Two and three dimensional structures of best I-TASSER models after 3ns simulation. 2D figures were generated in PDBsum server while 3D was prepared using Chimera software. A, E represents AtPOT1b; B, F represents AtTRB1; C, G represents AtTRB2; D, H represents AtTRB3. Oval dashed lines represent the specific regions of protein which interacts with each other for all the three proteins while the positions of interacting residues can be inferred from 2D figures. Figure 9 A-D: Ramachandran Plot of the four proteins as depicted by PROCHECK server. AtPOT1b, AtTRB1, AtTRB2 and AtTRB3 are represented by Figure A, B, C and D respectively. Most favored regions are colored red, additional allowed as yellow, generously allowed as light yellow and disallowed regions as white fields respectively. Figure 10 A: DIMPLOT result of AtPOT1b-AtTRB1 interaction. Blue labels represent interacting residues of AtPOT1b while red represents AtTRB1. Figure 10 B: DIMPLOT result of AtPOT1b-AtTRB2 interaction. Blue labels represent interacting residues of AtPOT1b while red represents AtTRB2. Figure 10 C: DIMPLOT result of AtPOT1b-AtTRB3 interaction. Blue labels represent interacting residues of AtPOT1b while red represents AtTRB3. Figure 11 A-B: PROPKA results of AtPOT1b with peptide and without peptide. ‘A’ represents AtPOT1b without peptide while ‘B’ represents AtPOT1b with peptide. Unbound AtPOT1b requires very high energy for stability with 23.5 kcal mol-1 while Peptide bound AtPOT1b requires very low energy for stability with -44.4 kcal mol-1 suggesting the binding of the peptide to AtPOT1b is highly stable. Figure 12: Distribution of amino acids of Protection of Telomeres 1 (POT1) protein in different organisms. Amino acid frequency of Protection of Telomeres 1 (POT1) protein in different organisms suggests abundant presence of Leucine in all organisms. Figure 13: Distribution of amino acids of Telomerase Reverse Transcriptase (TERT) protein in different organisms. Amino acid frequency of Telomerase Reverse Transcriptase (TERT) protein in different organisms shows abundant presence of Leucine in all organisms. FIGURES: Figure 8: Figure 9: Figure 10 A: Figure 10 B: Figure 10 C: Figure 11: Figure 12: Arabidopsis_thaliana Arabidopsis_lyrata Olimarabidopsis_pumila Lepidium_alyssoides Brassica_oleracea Neslia_paniculata Boechera_platysperma Cardaminopsis_arenosa Arabidopsis_neglecta Turritis_glabra Cardamine_pulchella Pachycladon_stellatum Lepidium_draba Matthiola_integrifolia Euclidium_syriacum Avg. Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Total 4.185 3.5242 5.5066 6.6079 5.9471 4.4053 2.4229 6.8282 6.3877 9.0308 3.0837 4.185 4.6256 2.8634 6.8282 7.9295 4.185 6.8282 1.9824 2.6432 454 4.4444 3.5556 4.8889 6.2222 5.5556 4.4444 2.8889 6.8889 6.2222 9.3333 2.8889 4.4444 4.4444 3.3333 6 7.7778 4.6667 7.3333 2 2.6667 450 4.3796 3.4063 5.1095 5.3528 5.8394 5.1095 2.6764 6.8127 6.326 9.7324 2.4331 3.6496 4.8662 3.4063 6.5693 8.2725 3.8929 7.5426 1.7032 2.9197 411 4.3796 3.4063 5.3528 5.5961 5.8394 4.8662 2.4331 6.8127 6.0827 9.7324 2.4331 4.1363 4.8662 3.4063 6.8127 7.7859 3.8929 7.5426 1.7032 2.9197 411 4.8889 3.1111 6.2222 5.3333 7.1111 4.6667 2.2222 6.6667 6 8.8889 2.2222 4.6667 5.3333 2.8889 6.2222 8.4444 4.6667 6.4444 1.7778 2.2222 450 4.6569 3.6765 4.1667 6.1275 5.1471 4.4118 2.9412 6.6176 6.1275 10.539 2.6961 4.4118 4.902 3.4314 6.1275 7.598 4.4118 7.8431 1.4706 2.6961 408 4.3902 3.1707 4.3902 6.3415 6.0976 5.3659 2.9268 7.0732 6.8293 9.7561 2.1951 3.9024 4.878 3.4146 6.3415 7.561 4.3902 6.8293 1.4634 2.6829 410 4.8469 3.3163 4.0816 5.8673 5.8673 5.3571 2.8061 6.3776 6.8878 11.48 2.2959 3.8265 4.5918 2.8061 5.8673 8.9286 3.8265 7.1429 1.2755 2.551 392 5.102 2.8061 4.3367 5.6122 6.1224 5.3571 3.0612 6.3776 6.8878 11.48 2.2959 3.8265 4.5918 2.8061 6.1224 8.4184 3.5714 7.398 1.2755 2.551 392 4.6036 3.5806 4.6036 5.6266 5.8824 5.6266 3.0691 6.6496 6.9054 10.742 2.046 3.3248 4.8593 2.3018 6.3939 7.1611 4.3478 8.1841 1.2788 2.8133 391 3.8929 3.4063 4.6229 7.2993 5.8394 5.8394 2.4331 5.8394 6.0827 10.706 1.9465 4.1363 5.1095 3.4063 6.326 7.2993 5.1095 7.056 1.2165 2.4331 411 4.6341 3.4146 5.122 6.0976 6.8293 5.6098 2.9268 7.3171 7.0732 10 1.4634 2.9268 5.3659 3.1707 5.3659 6.5854 4.6341 6.5854 1.9512 2.9268 410 4.6154 3.3333 4.8718 5.641 6.9231 5.641 3.3333 6.6667 6.6667 9.7436 2.0513 2.5641 4.6154 2.8205 6.1538 8.7179 4.359 7.1795 1.5385 2.5641 390 5.102 3.5714 5.102 5.6122 7.1429 4.8469 3.3163 6.8878 7.1429 9.949 1.5306 3.3163 4.5918 3.3163 6.3776 7.6531 4.5918 6.3776 1.5306 2.0408 392 4.6036 3.3248 5.8824 4.8593 7.4169 5.1151 3.3248 6.3939 7.6726 8.6957 2.046 4.6036 4.6036 2.8133 5.3708 7.1611 5.3708 7.6726 1.5345 1.5345 391 4.5757 3.375 4.9651 5.89 6.2307 5.0949 2.8395 6.6851 6.6039 9.9627 2.2554 3.878 4.8191 3.0829 6.1983 7.8209 4.3972 7.1881 1.5901 2.5475 410.87 Figure 13: Homo_sapiens Mus_musculus Rattus_norvegicus Arabidopsis_thaliana Oryza_sativa Tetrahymena_thermophila Bos_taurus Canis_familiaris Oxytricha_trifallax Avg. Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Total 8.7456 2.5618 3.0035 3.9753 4.1519 6.6254 3.0035 2.0318 3.5336 12.986 1.0601 1.8551 7.6855 4.1519 11.042 6.6254 5.1237 7.7739 1.5901 2.4735 1132 5.8824 3.1194 3.2086 3.4759 4.902 4.8128 2.9412 3.1194 4.6346 13.369 2.139 2.7629 6.0606 5.4367 8.6453 8.7344 5.3476 6.8627 1.426 3.1194 1122 6.4889 3.0222 3.1111 3.2889 4.9778 5.6889 2.8444 2.8444 5.1556 13.067 1.9556 2.6667 6.4 5.1556 7.9111 8.6222 5.4222 7.0222 1.4222 2.9333 1125 2.7605 3.2057 4.9866 4.3633 4.8085 4.0071 3.2057 5.4319 8.1033 10.864 1.6919 5.0757 4.1852 4.1852 7.3909 10.062 3.9181 6.5895 1.6028 3.5619 1123 4.9245 4.7657 4.448 3.4948 4.2097 4.6863 3.2566 5.8777 7.3074 9.2931 1.9063 5.4011 3.6537 3.4154 7.1485 11.676 3.5743 5.56 1.1914 4.2097 1259 1.5219 1.4324 4.3868 6.0877 7.0725 2.7753 0.8953 9.3107 11.638 10.027 1.6115 9.8478 2.1486 9.3107 2.6858 5.6401 3.6705 4.2077 0.6267 5.103 1117 10.756 2.7556 2.9333 3.5556 4.1778 8.2667 2.6667 1.3333 2.8444 13.6 0.8889 1.8667 7.7333 4.5333 12.089 5.6889 3.7333 7.2889 1.1556 2.1333 1125 10.508 3.0276 2.7605 3.3838 4.0962 6.9457 3.0276 2.0481 3.1167 13.802 1.1576 2.1371 7.6581 4.3633 10.686 6.0552 4.3633 6.9457 1.2467 2.6714 1123 3.6219 1.8551 3.9753 6.0954 7.9505 3.1802 1.4134 7.6855 12.014 8.8339 3.0035 9.4523 2.3852 6.3604 3.4452 5.3004 4.1519 4.2403 0.7951 4.2403 1132 6.1221 2.8856 3.6557 4.1821 5.1375 5.2154 2.5931 4.4258 6.4925 11.727 1.7157 4.572 5.3032 5.1862 7.8865 7.6526 4.3576 6.2683 1.2283 3.3925 1139.8