Analysis of the functional regions predicted by PatchFinder PatchFinder detects regions of evolutionarily conserved residues on the protein surface in proteins with known 3D structure (Nimrod et al., 2005; Nimrod et al., 2008). These regions (patches) typically correspond to the functional regions of these proteins. We examined PatchFinder predictions on a dataset of 121 DBPs which were co-crystallized with DNA. In 74% of the cases at least half of the positions in the patch were in contact with the DNA, but only in 17% of the cases the patch covered at least half of that interface (Nimrod et al., 2009). We further analyzed the protein-DNA interfaces in these structures using NUCPLOT (Luscombe et al., 1997). According to this analysis, PatchFinder’s patches covered at least half of the interface hydrogen bonds in 50% of the proteins (Nimrod et al., 2009). Hydrogen bonds are often important in protein-DNA sequence recognition (Mandel-Gutfreund et al., 1995). Distinction between protein residues that contact the DNA bases and the rest of the interface residues (i.e. those that contact the sugarphosphate backbone) may provide some information on the residue positions that are responsible for binding specificity (Lichtarge et al., 1997). Supplementary Figure 1 shows that in 47% of the cases at least half of the residues that contact the DNA-bases were in the PatchFinder patch, which is more than twice the fraction that met the same criteria for the whole interface. Supplementary Figure 1 suggested a partitioning of PatchFinder’s patches into two groups, according to the fraction of protein-DNA bases interactions identified by each patch. We used the DAVID Bioinformatic Resources (Huang da et al., 2009) in order to characterize the proteins for which PatchFinder identified the highest fraction of protein-DNA bases contacts in comparison with the whole dataset of DBPs. The DAVID analysis found that this group is enriched with the molecular function GO term (Ashburner et al., 2000) ‘transcription regulator activity’ (p-value 4.1e-4 ). This GO term is mainly associated with transcription factors which typically require specific recognition of the DNA. By contrast, inspection of the proteins whose MLpatches were devoid of protein-DNA bases interactions showed that this group is enriched with the GO term ‘calatlytic activity’ (p-value 7.3e-3). In these proteins, the main conservation signal came from the catalytic site rather than the recognition site. A similar trend was noticed with the analysis of protein-DNA interface hydrogen bonds (Nimrod et al., 2009). Example of a protein predicted to bind DNA PDB entry 3BHW corresponds to the structure of an uncharacterized protein of 199 residues from Magnetospirillum magneticum, solved by the New York SGX Research Center for Structural Genomics (NYSGXRC). The protein was assigned the PFAM domain DUF1819 of uncharacterized proteins (Finn et al., 2008). Some of the entries associated with this PFAM domain are putative inner membrane proteins. The iDBPs server predicts that the protein binds DNA. We examined the protein with the Dali server for the identification of structural similarity between proteins (Holm and Park, 2000). Dali found structural similarity between 3BHW and several DNA binding proteins including a restriction endonuclease BpuJI from Bacillus pumilus that was crystallized bound to a dsDNA (PDB entry 2VLA; (Sukackaite et al., 2008)). Supplementary Figure 2.A presents a superimposition of PDB entry 3BHW (green ribbons) and BpuJI (blue ribbons) bound to a dsDNA (orange). Besides the high structural similarity between the proteins, the superimposition also placed 3BHW in a potential DNA binding position with only minor clashes with the DNA. Electrostatic potential analysis of 3BHW (Supplementary Figure 2.B), showed that the suggested DNA binding region is positively charged as well. This property is typical of DNA binding regions (Stawiski et al., 2003). We also examined the protein using the DBDhunter web server for the identification of DBPs. DBD-hunter is a knowledge-based method that looks for structural similarity between the query protein and known DBPs and analyses the DNA-protein statistical potential (Gao and Skolnick, 2008). This approach, which is complementary to ours, also found 3BHW as a potential DBP. Our analysis therefore provides useful information regarding the possible functional role of this hypothetical protein, awaiting experimental validation. Moreover, we suggest potential residues that directly participate in the predicted binding of DNA (see Supplementary Figure 3). These positions are candidates for future mutagenesis efforts, aimed at validating the computational functional prediction. Supplementary Figure 1. Analysis of PatchFinder’s predictions for DNA binding regions. The plot presents the fraction of proteins within each sensitivity bin in a dataset of 121 DBPs which were crystallized bound to DNA. Black bars represent the sensitivity in the identification of protein residues which are within 6Å from the DNA (Nimrod et al., 2009). Grey bars represent the identification of protein residues which contact DNA bases. The calculations were conducted using NUCPLOT (Luscombe et al., 1997). According to the analysis most of the patches are associated with one of the extreme groups in the graph. They either cover more than 90% of the positions that contact the DNA-bases or less than 10% of these positions. The first group is enriched with transcription factors which typically require DNA sequence recognition. The second group, on the other hand, is enriched with DBPs with catalytic activity. Supplementary Figure 2. A. Superimposition of PDB entry 3BHW from Magnetospirillum magneticum (green ribbons) with restriction endonuclease BpuJI from Bacillus pumilus (blue ribbons) bound to a dsDNA (orange; (Sukackaite et al., 2008)). The superimposition was calculated using the C-alpha Match algorithm (Bachar et al., 1993) and the figure was produced using UCSF Chimera (Pettersen et al., 2004). B. The electrostatic potential of 3BHW mapped on the protein surface according to the coloring scale on the left. The potentials were calculated using PMV (Sanner, 1999) and the figure was produced with PyMol (DeLano, 2002). Overall, the figure shows evidence that supports the prediction that 3BHW may bind DNA. Supplementary Figure 3. A screenshot of PatchFinder’s analysis for PDB entry 3BHW provided by the iDBPs server. PatchFinder found two patches with high confidence to be functionally important. The molecule is in space-filling representation. The primary and the secondary patches are colored red and blue respectively and the rest of the protein is in grey. Assuming 3BHW is a DBP, these patches of evolutionarily conserved residues are good candidates for DNA binding residues. Similar analysis is provided by the iDBPs server for every submitted protein structure, when enough sequence homologs are available (see main text). References Ashburner, M. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25, 25-29. Bachar, O. et al. (1993) A computer vision based technique for 3-D sequenceindependent structural comparison of proteins. Protein Eng., 6, 279-288. DeLano, W.L. (2002) The PyMOL molecular graphics system. Finn, R.D. et al. (2008) The Pfam protein families database. Nucleic Acids Res, 36, D281-288. Epub 2007 Nov 2026. Gao, M. and Skolnick, J. (2008) DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res, 36, 3978-3992. Holm, L. and Park, J. (2000) DaliLite workbench for protein structure comparison. Bioinformatics, 16, 566-567. Huang da, W. et al. (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 4, 44-57. Lichtarge, O. et al. (1997) Identification of functional surfaces of the zinc binding domains of intracellular receptors. J Mol Biol, 274, 325-337. Luscombe, N.M. et al. (1997) NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. Nucleic Acids Res., 25, 4940-4945. Mandel-Gutfreund, Y. et al. (1995) Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. J. Mol. Biol., 253, 370-382. Nimrod, G. et al. (2005) In silico identification of functional regions in proteins. Bioinformatics, 21, i328-337. Nimrod, G. et al. (2008) Detection of functionally important regions in 'hypothetical proteins' of known structure. Structure, 16, 1755-1763. Nimrod, G. et al. (2009) Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J. Mol. Biol., 387, 1040-1053. Pettersen, E.F. et al. (2004) UCSF Chimera - a visualization system for exploratory research and analysis. J. Comput. Chem., 25, 1605-1612. Sanner, M.F. (1999) Python: a programming language for software integration and development. J. Mol. Graph. Model., 17, 57-61. Stawiski, E.W. et al. (2003) Annotating nucleic acid-binding function based on protein structure. J. Mol. Biol., 326, 1065-1079. Sukackaite, R. et al. (2008) The recognition domain of the BpuJI restriction endonuclease in complex with cognate DNA at 1.3-A resolution. J Mol Biol, 378, 1084-1093.