Practical Assignment – Identifying protein interaction interfaces with Perl scripting. Calculation of the solvent accessible surface is typically used to identify interface amino acids in a given protein complex structure. The basic idea is: amino acids that are exposed in the surface of the unbound protein and no longer exposed in the complex, are those located in the interface. The assignment consists in writing a Perl script to automatically identify the interface residues in the structure of a protein complex. STEP 1 Protein 1: Arabidopsis Histidine Kinase 4. File: Arabidopsis_HistidineKinase4.pdb Protein 2: Arabidopsis Trans Zeatin File: Arabidopsis_TransZeatin.pdb Complex: Protein1 in complex with Protein2 File: Arabidopsis_Complex_HistidineHinase4_TransZeatin.pdb You can use available tools for protein structure visualization (e.g. PyMOL or RASMOL) to inspect the structures of the proteins and the complex. STEP 2 We may use the software NACCESS(Hubbard SJ, 1993) to calculate the residue solvent accessibility (RSA). The basic idea is: we compare the RSA calculated from the unbound structures (Protein 1 and 2) against the RSA calculated from the complex structure; then, those residues that show lower RSA in the complex, are identified as interface residues. RSA file protein 1: Arabidopsis_HistidineKinase4.rsa RSA file protein 2: Arabidopsis_TransZeatin.rsa RSA file complex: Arabidopsis_Complex_HistidineHinase4_TransZeatin.rsa The beginning and end of an example .rsa file are shown below. The column 3 (in red) indicates to which protein the residue belongs. Here we have A=Protein 1, B=Protrein2. For each residue, the amino acid type is given in the column 1; an integer ID is given in column 4 (in green); and the absolute RSA is given in column 6 (in blue). RES MET A 126 135.22 69.6 80.60 51.5 54.62 145.6 82.33 52.2 52.89 RES ASP A 127 77.65 55.3 75.75 73.8 1.90 5.0 24.29 49.3 53.35 RES ASP A 128 108.96 77.6 105.64 102.9 3.33 8.8 38.93 79.1 70.03 RES ALA A 129 49.34 45.7 46.49 67.0 2.85 7.4 46.49 65.1 2.85 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RES ARG B 390 89.23 . 37.4 . 89.23 44.3 .00 . .0 . 32.58 . 41.9 . 56.65 . RES TYR B 391 26.45 12.4 19.16 10.8 7.29 20.6 17.04 12.5 9.41 STEP 3 – Perl Scripting First, create a text file with the extension .pl for your script. Then, follow the steps described in the file ScriptIdentifyInterfaceResidues.pl to finish the script. The script will read the RSA files (RSA files protein 1, protein 2 and RSA file complex) to compare the RSA values. The results will be saved in the output file. A template script is given in the file ScriptIdentifyInterfaceResidues.pl. Try to complete the script and execute it to get a list of interface residues. You can use the visualization tool to verify if the resultant residues are located in the interface. Hubbard SJ TJ (1993) ‘NACCESS’, Computer Program. London: Department Molecular Biology. University College.