SAP calculation procedure By Naresh Chennamsetty SAP (Spatial-Aggregation-Propensity) algorithm predicts the aggregation prone regions of a protein (N. Chennamsetty et. al., PNAS, 106, 11937 (2009)). SAP also predicts protein binding regions (N. Chennamsetty et. al., Proteins, 79, 888 (2011)). The procedure to evaluate SAP for a protein using the scripts developed at MIT is detailed below. This procedure gives a SAP value for a static structure (such as an X-ray structure). These SAP values can be averaged over a simulation trajectory to get an average value for simulation. Requirements: 1. A computer with CHARMM simulation program installed (www.charmm.org) The SAP scripts developed at MIT are written for the CHARMM program. However, SAP algorithm can be re-written in any other script of choice by the user. 2. Structure of the protein of interest (either from X-ray or homology modeling) Summary of Steps: 1. Denote the PDB coordinate files for different chains within the protein as ‘chain1.pdb’, ‘chain2.pdb’…etc. Also set the ‘NUMchains’ parameter within the file ‘inputs.str’ to the total number of chains. 2. Run the SAP script ‘sap.inp’ using the CHARMM program. Detailed Steps: 1. Denote the PDB coordinate files for different (disjoint) chains within the protein (obtained either from homology modeling or x-ray structure) as ‘chain1.pdb’, ‘chain2.pdb’…etc. Also change the ‘NUMchains’ parameter within ‘inputs.str’ script to the specified total number of chains. Any number of chains can be supplied to the SAP script. For e.g., an antibody Fab fragment has two chains (light and heavy chains), which should be denoted as ‘chain1.pdb’ and ‘chain2.pdb’, and ‘NUMchains’ set to 2. It is preferable that these files only contain the coordinate information (i.e., lines starting with “ATOM...”) and no other extra data or lines. If you have CYS-CYS (disulphide) bonds in your protein and if you want to preserve these bonds, then enter the CYS residue numbers manually in the file ‘cysbonds.str’ and change the parameter ‘DisulphideBond’ to ‘YES’. If you do not enter these CYS residues manually, the program still runs and gives SAP values, but there will be a small error in SAP values (near CYS residues). 2. Run the SAP script ‘sap.inp’ using the command line prompt such as “charmm-executable < sap.inp > out.dat”. The file ‘find_sap’ has an example execution run. The ‘sap.inp’ script calls all other scripts and inputs automatically to calculate the final SAP score. The calculation itself is quite fast and takes less than 5 minutes to find SAP values for a protein. If required, more accurate SAP values can be obtained by averaging the SAP values from different protein conformations obtained along a simulation trajectory. The several outputs obtained after running the ‘sap.inp’ script are explained below. Outputs: 1. ‘sap_score_each_atom.dat’ gives the SAP value at each atom (last column). 2. ‘sap_score_residue_average.dat’ gives the average SAP value for each residue. 3. ‘protein_sap_mapped.pdb’ gives the final structure (pdb) file where SAP values are included in the beta field (second to last column). This pdb file can be viewed in molecular visualization programs such as ‘VMD’ to see the protein surface colored according to the SAP values (for e.g., high SAP regions in red color, and low SAP regions in blue color). In VMD, select “Graphics > Representations > Coloring method > Beta” to get the protein colored according to SAP values. 4. Note: By default, the SAP radius is set to R=10Å. This can be modified in the file ‘inputs.str’ to any other SAP-radius of interest. Typically, setting R=10Å shows broader hydrophobic patches (SAP at low resolution) whereas R=5 Å shows more detailed hydrophobic patches (SAP at high resolution).