Protein Rigidity Analysis on the Whole PDB database using FIRST Diana Jaunzeikare, Faculty advisor: Ileana Streinu, Computer Science Department, Smith College, MA Proteins are biological molecules which consist of a long chain of amino acids, folded in a three dimensional structure. The 3D structure and the movement of proteins determine their function. By analyzing the protein's flexibility we expect to get a better understanding of its functionality. Floppy Inclusion and Rigid Substructure Topography (FIRST) is a piece of software developed at Arizona State University to perform such an analysis. We developed an infrastructure for surveying the entire PDB database, and collect rigidity information. The surprising outcome is that most of the PDB files are not precise enough to support the automation. Laman's Theorem: A graph G with n vertices and m edges is the graph of a generic minimally rigid framework if and only if m = 2n­3, and any subset of n'<n vertices in G span at most 2n'­3 edges. Background: Proteins Tay's Theorem: A multi­graph G with n vertices and m edges is the graph of a generic minimally rigid body­bar­hinge framework if and only if m = 6n­6, and any subset of n'<n vertices span at most 6n'­6 edges. Pebble game: an algorithm to implement Laman’s or Tay’s theorem to count degrees of freedom of a graph. This is a combinatorial algorithm (a counting algorithm), therefore it is very fast (as there are no floating point operations). Project Description: Scripts developed to: * Run pogram/script on a whole PDB * Split NMR models in separate files * Renumber atoms * Filter the PDB files by *experimental method *resolution *DNA vs Protein * Detect FIRST warnings Many proteins have been experimentally solved and deposited into the Protein Data Bank (PDB), which contains by now about 50,000 entries (~30 GB unarchived). Biologists usually pick a select few or group of proteins on which to perform analysis. Proteins as Robotic Arms = Chain of Rigid Bodies The bond angles between CA N C and N C CA are fixed. The backbone atoms of the protein are CA, N, C, CA and they form a plane. Thus they can be modeled as a single unit – rigid body. There is ψ dihedral angle between CA and N and angle φ between the C and CA atoms. In the picture above HIV1­protease can be seen with a flap being open and closed. Depending on the state of the flap, the rigidity of the protein changes. The flap motion is essential to the function of this protein. Analyzing whole PDB: Only 11% of the 50 000 files run without any problems Automation The problems found in PDB: * One file consists of several models * There are additional atoms, for example, ligands, water * Some atoms have alternate positions, thus confusing the software and making wrong bonds Future work: Insert missing residues using Robotics methods. References: [1] Jacobs, Donald J., A. J. Rader, Leslie A. Kuhn, and M. F. Thorpe. ”Protein flexibility predictions using graph theory.” Proteins: Structure, Function, and Genetics 44 (May 2001): 150­165. [2] D.J. Jacobs and M.F. Thorpe (1995) Generic Rigidity Percolation: The Pebble Game. Phys. Rev. Letts., 75, 40514054. [3] Audrey Lee, Ileana Streinu, Louis Theran “Rigidity and pebble games”, http://linkage.cs.umass.edu/pg/pg.html [4] Protein Data Bank http://www.rcsb.org/pdb/home/home.do