Chapter 12 Protein Structure Basics •20 naturally occurring amino acids •Free amino group (-NH2) •Free carboxyl group (-COOH) •Both groups linked to a central carbon (C) Dihedral Angles Ramachandran plot Hierarchy •Primary structure •Linear sequence of amino acids •Secondary structure •Local conformation of the peptide chain •Stabilized by H-bonds between NH and C=O of different residues •Tertiary structure •3 dimensional arrangement of all secondary structure elements and connecting regions •Quaternary structure •Assembly of several polypeptide chains into a protein complex Stabilizing forces Secondary to Quaternary structure maintained by non-covalent forces Electrostatic interactions Excess negative charge balanced by positive charge in another region Salt bridge Van der Waals forces Induced dipole Hydrogen bonding Sharing of proton by two electron negative atoms Short distance (<3Å) Helices •3.6 aa per turn •=60º •=45º •A, Q, L M frequent •P, G, Y scarce -Sheet •H-bonded -strands •Parallel •Anti-parallel Coiled-coil 1KD8 Tertiary Structures Globular proteins Compact Polar and hydrophilic aa on the outside Hydrophobic amino acids on the inside Integral Membrane Proteins Exist in lipid bilayers Helix segments Connecting loopsliein aqueous phase X-ray crystallography •Protein crystallized •Illuminated with X-ray beam, and diffraction pattern recorded •Diffraction pattern converted to electron density map by Fourier transformation •To interpret 3D structure from 2D electron density ,map require phase information •Molecular replacement •Use homologous protein structure as template •Multiple isomorphous replacement •Compare e- density changes in protein crystals containing strongly diffracting heavy metals •Model with amino acid residues that best fit the density map NMR •Proteins labeled with 13C or 15N •Radiofrequency radiation used to induce nuclear spin state transitions in a magnetic field •Interactions between spinning isotope pairs produce radio signal peaks that correlate with distance between them •Information on distanmces between all pairs allow protein model to be derived •NMR determines structure in solution •Dynamic conformations means that 20-40 structures satisfy distance constrains •Can only solve <200aa proteins Protein Structure Database x,y,z position of each atom in crystal http://www.rcsb.org/pdb/ 60000 1200 50000 1000 40000 800 30000 600 20000 400 10000 200 19 72 19 74 19 76 19 78 19 80 19 82 19 84 19 86 19 88 19 90 19 92 19 94 19 96 19 98 20 00 20 02 20 04 20 06 20 08 0 Total proteins 0 1972 1975 1978 1981 1984 1987 1990 1993 1996 Total folds 1999 2002 2005 2008 PDB File Format HEADER TITLE COMPND COMPND COMPND COMPND COMPND COMPND STRUCTURAL PROTEIN 19-JAN-00 1DXX N-TERMINAL ACTIN-BINDING DOMAIN OF HUMAN DYSTROPHIN MOL_ID: 1; 2 MOLECULE: DYSTROPHIN; 3 CHAIN: A, B, C, D; 4 FRAGMENT: ACTIN-BINDING; 5 ENGINEERED: YES; 6 MUTATION: YES .... ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 N CA C O CB N CA C O CB OG N CA C O CB N CA C O CB CG CD OE1 OE2 N CA C O CB CG ASP ASP ASP ASP ASP SER SER SER SER SER SER TYR TYR TYR TYR TYR GLU GLU GLU GLU GLU GLU GLU GLU GLU ARG ARG ARG ARG ARG ARG A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 12.508 13.095 12.436 12.528 14.604 11.786 11.064 9.584 9.105 11.170 12.228 8.923 7.469 7.021 6.507 6.902 7.465 7.227 6.129 6.474 8.238 9.467 9.287 8.844 9.501 4.898 3.758 3.458 2.709 2.546 2.797 -13.297 -13.021 -11.836 -11.643 -12.820 -10.979 -9.874 -10.270 -10.327 -8.536 -8.489 -10.531 -10.665 -9.267 -9.012 -11.787 -8.308 -6.877 -6.708 -6.721 -5.854 -5.254 -4.625 -3.454 -5.315 -6.585 -6.423 -4.954 -4.478 -7.147 -8.664 -10.855 -9.506 -8.798 -7.564 -9.611 -9.601 -8.982 -8.884 -7.742 -9.692 -10.623 -10.021 -10.022 -9.544 -8.432 -9.161 -10.384 -10.295 -11.389 -12.555 -10.720 -10.159 -8.796 -8.787 -7.773 -10.978 -11.854 -11.964 -11.111 -11.212 -11.236 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 72.03 73.14 73.18 73.10 73.74 70.17 65.94 62.93 65.59 64.61 66.53 55.09 47.65 47.76 43.11 49.31 46.50 40.38 35.66 32.67 43.57 45.68 51.24 54.08 52.85 32.78 25.88 24.51 30.56 23.91 27.29 N C C O C N C C O C O N C C O C N C C O C C C O O N C C O C C Other structure file formats mmCIF •Macromolecular crystallographic information file •Similar to relational database •Each field assigned a tag and linked to another field MMDB •Molecular modeling database •ASN.1 format •Nested hierarchy Chapter 13 Protein structure visualization, comparison and classification Download and install Jmol http://jmol.sourceforge.net/ wireframe CPK (Corey, Pauling and Koltan) Ball-and-stick Cartoon Rendered in POV-Ray: http://www.povray.org/ Protein structure comparisons Comparing two protein structures is a fundamental technique in protein analysis Finding remote homologs Proteins structures can be very similar even if sequence identity is very low (<20%) Intermolecular method Identify equivalent residues Translate one structure relative to the other unlik both occupy same space Rotate one structure relative to other, and continuously calcuilate distances between equivalent residues N Root mean square deviation i 1 Di2 N Larger proteins have larger RMSD Difficult to identify equivalent residues Discard regions outside secondary structures Work with 6-9 residue fragments Dynamic programming, starting with few equivalent residues Intramolecular method •Calculate a distance matrix of all residue distances in two proteins, separately •Translate two matrices until differences are minimal •Good to identify similar secondary structure regions in two proteins Multiple structure alignment Compare structures in pairwise fashion, generating matrices based on RSMD scores Construct phylogenetic tree Two must similar structures are realigned Median structure =created to which other more distant structures are systematically aligned DALI Distances calculated from intra-molecular C distances matrices Matrices are aligned to find local structural similarities Calculate Z-score CE Combinatorial Extension Like DALI, but uses every 8th residue VAST Vector Alignment Search Tool Uses intra- and intermolecular approaches SSAP Intramolecular based methods Dynamic programming to find residue path with optimal score STAMP Intermolecular approach, using dynamic programming Protein structure classification •Classification systems allows identification of relationships between structures •Provide evolutionary view of all structures •Newly solved structures can be fitted into hierarchy, defining possible functions SCOP (Structural Classification of Proteins) Manual; examination of structures Classes, folds, families and super families Families share high sequence homology Super families may have common ancestral proteins Folds look at order and connectivity of secondary structures, may not be evolutionary related Classes: folds with similar core structures: all-. all-, and , etc. CATH (Class, architecture, topology and homologs) Uses automatic assignment with SSAP as well as manual comparison Class similar to SCOP Architecture intermediate between SCOP fold and class: overall packaging and arrangement of secondary structures without regard for connectivity Topology = SCOP fold Homologous superfamily and homologous family equivalent to SCOP super family and family