Supplementary Methods Adaptive Smith-Waterman Residue Match Seeding for Protein Structural Alignment Christopher M. Topham1-3,*, Mickaël Rouquier1-3, Nathalie Tarat1-3 and Isabelle André1-3,* 1 Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France 2 CNRS, UMR5504, F-31400 Toulouse, France 3 INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France *Authors for correspondence: e-mail christopher.topham@insa-toulouse.fr; isabelle.andre@insa-toulouse.fr Running title: Protein structural alignment 1 Supplementary method S1: The PANORAMA protein structure analysis program is integrated with a database of amino acid, nucleotide, sugar, lipid and small molecule residue library files, derived from atom connectivity data in the re-mediated version 3 RCSB PDB (Henrick et al. 2008) chemical component dictionary (http://www.wwpdb.org/ccd.html). The library files contain additional data defining molecular geometry at central heavy atom positions and chemical property annotations, including hydrogen bond donor/acceptor atom status, and aromatic ring atom assignments, obtained by the application of Hückel’s 4n +2 -electron rule. PANARAMA first searches for inter-residue covalent bonds in macromolecular co-ordinate data sets using generous distance and valence bond angle range cut-offs. Individual molecules are then identified as assemblies of covalently-bonded atoms, and classed according to type (protein, nucleic acid, polysaccharide etc.). Small molecule ligands are operationally defined as molecules with ≥6 and ≤ 100 atoms. Once the complete bond connectivity has been determined, the molecular geometry of atoms at residue-junctions is reset, and the number of attached hydrogen atoms updated accordingly. Secondary structural analysis is performed using an implementation of the Kabsch and Sander (1983) algorithm. Main-chain residue conformation is described by four classes: (, 310, or ) helix, -strand, +ve dihedral angle, or random coil. Di-sulphide bonded cysteine residues are identified as having -S – -S inter-atomic separation distances of < 3.0 Å (Kabsch and Sander, 1983), and stereochemical geometries compatible with the Sowdhamini et al. (1989) A, B or C quality grades. Control checking for the proximity of co-ordinating metals, incompatible with disulphide bonding, is also carried out. Di-sulphide bridge partner 2 ambiguities in poorer quality structures are resolved by systematic searching of all feasible arrangements within a given network using a summed empirical disulphide geometry quality score. Atomic solvent-accessible surface areas are calculated using the Shrake and Rupley (1973) method with a probe size of 1.4 Å, and a test-point density of approximately 2000 points per atom. The van der Waals radii of Chothia (1976) are used for atoms in the 20 standard amino acid residues and for chemically equivalent atoms in other residue types. The van der Waals radius for phosphorus is taken from Lesk (1991), and radii for all other elements from Flower (1997). Hydrogen atoms are not included in the accessibility calculations. Relative side-chain solvent accessibilities are computed with respect to summed side-chain atom accessibilities in extended conformation Ala-X-Ala tri-peptides, geometry-optimised using GAMESS (Schmidt et al., 1993) at a high level of quantum chemical theory (B3LYP/6-311++G(d,p)) in a (CPCM) conductor-like polarized continuum model (Miertuš et al., 1981; Barone and Cossi, 1998; Cossi et al., 2003) to account for solvent electrostatics (C.M. Topham and J.C. Smith, unpublished results). Side-chains with relative solvent accessibilities of ≤ 7% are classed as inaccessible. Hydrogen bonds are assigned to donor-acceptor atom pairs separated by ≤ 3.9 Å (or 4.0Å for hydrogen bonds involving sulphur atoms), subject to a hydrogen-acceptor separation distance of ≤ 2.5 Å. Hydrogen atoms are positioned at donor atom centres using idealised internal coordinate parameter values (McDonald and Thornton, 1994). The placement of single hydrogen atoms at tetrahedral centres, histidine protonation state assignment, and the resolution of ambiguous asparagine and glutamine carbamoyl group and histidine imidazole ring flip-states is achieved through optimisation of summed interaction energies in local 3 hydrogen bond networks using a genetic algorithm. The additive energy function is composed of three terms: (i) a 6-4 direction-dependent hydrogen bond function (Goodford, 1985), employing Autodock 3 minimum-energy separation and well-depth energy parameters (Morris et al., 1998), (ii) a shallow energy well Lennard-Jones 12-6 function, with collision diameter parameters of 2.47 Å or 3.60 Å to prevent the creation of unfavourable hydrogenhydrogen or hydrogen-metal close contacts, respectively, and (iii) the protonation-state and flip-state energy penalties of Hooft et al. (1996). Ligand-contacting protein residues are identified using a variant of the occluded molecular surface analysis method (Pattabiraman et al., 1995), based on vector projection normal to the contact surface component of the molecular surface (Richards, 1977) of the residue in the absence of the surrounding protein environment. Contact surface unit normal vectors are determined for residue atoms in isolated tri-peptides centred on the residue of interest (or for half-cystine, in the isolated branched-chain tetra-peptide containing the di-sulphide bonded partner residue) using the Shrake and Rupley (1973) algorithm and the same parameter variables as for the solvent accessibility calculations described above. Atoms of the occluded surface, to which neighbouring atoms in the tri-peptide do not contribute, are represented as their van der Waals spheres. The maximum vector outward projection distance from the contact surface is set to 2.8 Å, approximating the diameter of a water molecule. Protein residues with ligand atom contributions to the occluded contact surface of at least one mainchain or side-chain atom are considered to be in contact with the ligand. Metal ion coordinating protein residues are identified independently as residues possessing one (or more) unprotonated oxygen, nitrogen or sulphur atom(s) within a separation distance of ≤ 2.5 Å of the metal ion. 4 Protein residue aromatic environments are quantified using the occluded contact surface analysis method described above. Residues with >2% of summed side-chain, main-chain carbonyl or –NH group atom occluded contact surface areas composed of aromatic atoms are considered to be in an aromatic environment. REFERENCES: Barone, V. and Cossi, M. (1998) Quantum calculation of molecular energies and energy gradients in solution by a conductor solvent model. J. Phys. Chem. A, 102, 1995-2001. Chothia, C. (1976) The nature of the accessible and buried surfaces in proteins. J. Mol. Biol., 105, 1-14. Cossi, M., Rega, N., Scalmani, G. and Barone, V. (2003) Energies, structures, and electronic properties of molecules in solution with the C-PCM solvation model. J. Comp. Chem., 24, 669-681. Flower, D.R. (1997) SERF: a program for accessible surface area calculations. J. Mol. Graphics Mod., 15, 238-244. Goodford, P.J. (1985) A computational procedure for determining energetically favourable sites on biologically important molecules. J .Med. Chem., 28, 849-857. Henrick, K., Feng, Z., Bluhm, W.F., Dimitropoulos, D., Doreleijers, J.F., Dutta, S., FlippenAnderson, J.L., Ionides, J., Kamada, C., Krissinel, E., Lawson, C.L., Markley, J.L., Nakamura, H., Newman, R., Shimuzu, Y., Swaminathan, J., Velankar, S., Ory, J., Ulrich, E.L., Vranken, W., Westbrook, J., Yamashita, R., Yang, H., Young, J., Yousufuddin, M. and Berman, H.M. (2008) Remediation of the protein data bank archive. Nucleic Acids Res., 36, D426-D433. 5 Hooft, R.W.W., Sander, C. and Vriend, G. (1996) Positioning hydrogen atoms by optimising hydrogen-bond networks in protein structures. Proteins: Struct. Funct. Genet., 26, 363-376. Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-2637. Lesk AM (1991) Protein architecture: a practical approach. Oxford University Press, Oxford, New York, Tokyo, p49. McDonald, I.K. and Thornton, J.M. (1994) Satisfying hydrogen bonding potential in proteins. J. Mol. Biol., 238, 777-793. Miertuš, S., Scrocco, E. and Tomasi, J. (1981) Electrostatic interaction of a solute with a continuum. A direct utilization of ab initio molecular potentials for the provision of solvent effects. Chem. Phys., 55, 117-129. Morris, G.M., Goodsell, D.S., Halliday, R.S., Huey, R., Hart, W.E., Belew, R.K. & Olson, A.J. (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comp. Chem., 19, 1639-1662. Pattabiraman, N., Ward, K.B. and Fleming, P.J. (1995) Occluded molecular surface: analysis of protein packing. J. Mol. Recognition, 8, 334-344. Richards, F.M. (1977) Areas, volumes, packing and protein structure. Annu. Rev. Biophys. Bioeng., 6, 151-176. Shrake, A. and Rupley, J.A. (1973) Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol., 79, 351-371. Schmidt, M.W., Baldridge, K.K., Boatz, J.A., Elbert, S.T., Gordon, M.S., Jensen, J.H., Koseki, S., Matsunaga, N., Nguyen, K.A., Su, S., Windus, T.L., Dupuis, M. and Montgomery, 6 J.A. (1993) General Atomic and Molecular Electronic Structure System. J. Comput. Chem., 14, 1347-1363. Sowdhamini, R., Srinivasan, N., Shoichet, B., Santi, D.V., Ramakrishnan, C. and Balaram, P. (1989) Stereochemical modeling of disulfide bridges. Criteria for introduction into proteins by site-directed mutagenesis. Prot. Eng., 3, 95-103. 7