THE AXIOM Databanks + New tools S A D I C = New insights imple tom epth ndex protein fold barcoding CATH – ADAPT… alculator -1 SADIC: a new tool to analyze atom depth Digging inside objects to discover their origins Birth of the Earth protein folding atom depth 2D atom depth calculated as the distance with: the closest external water* the closest dot of the water accessible surface* the closest surface exposed atom* HEWL 4lzt * * Chakravarty S, Varadarajan R. Residue depth: a novel parameter for the analysis of protein structure and stability. Structure Fold Des. 1999 7:723-732 Pintar A, Carugo O, Pongor S. Atom depth as a descriptor of the protein interior. Biophys J. 2003 84:2553-2561. atom depth 2D 3D Calculation of exposed volumes Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860 HEWL 4lzt atom depth 3D Calculation of exposed volumes Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860 HEWL 4lzt atom depth 3D Calculation of exposed volumes Depth index: Di,r = 2Vi,r / V 0,r where Vi,r is the exposed volume of a sphere of radius r centered on atom i of the molecule and V0,r is the exposed volume of the same sphere when centered on an isolated atom the sphere radius r should have the biggest value which makes Vi = 0 for the most buried atom Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860 HEWL 4lzt 24,0 20,0 16,0 12,0 8,0 4,0 Di,r 2,0 1,5 1,0 0,5 0,0 r [Å] atom depth 3D vs 2D Thr 47 α carbon Di,9 = 1.59 Ile 58 α carbon Di,9 = 0.13 Trp 28 α carbon Di.9 = 0.03 28 58 47 HEWL 4lzt 3D atom depth analysis from PDB ID 1UBQ Di http://www.sbl.unisi.it/prococoa/ SBL Bioinformatics Projects Projects SADIC correlated: 1. fold dependent aa compositions of protein cores; 2. towards i-SADIC. ---------------------------------------------------- Projects SADIC uncorrelated: 1. systematic analysis of PPI Di analysis of protein atoms defining strutural layers in protein 3D structures each strutural layer includes atoms with similar Di’s fast and accurate analysis of aa content of structural layers Di analysis of protein atoms 3 VTR (chitinolytic enzyme 572 aa) color Ln Di L6 > 1.2 red L5 1.0 – 1.2 orange L4 0.8 – 1.0 yellow L3 0.6 – 0.8 green L2 0.4 -0.6 blue L1 0.2 - 0.4 indigo L0 < 0.2 violet 3D atom depth analysis K63 from PDB ID 1UBQ 0.19 0.30 0.25 0.23 0.50 0.68 0.91 1.11 1.29 N CA C O CB CG CD1 CD2 N CA C O CB CG CD OE1 OE2 0.10 0.05 0.11 0.18 0.02 0.02 0.02 0.00 Dimax 0.38 E24 0.52 0.50 0.52 0.76 0.95 Dimax 1.17 1.24 1.24 L43 http://www.sbl.unisi.it/prococoa/ Dimax N CA C O CB CG CD CE NZ Dimax analysis of protein residues defining aa occupancy in protein strutural layers each strutural layer includes residues with similar Dimax’s fast and accurate analysis of aa distribution in protein structures Dimax analysis of protein singles quite a few proteins like to stay single (at least in the crystalline state) Bioinformatiha 2, Firenze 18 ottobre -9 a database of protein singles Experimental Method: X-RAY (79,770) Chain Type: Protein (74,456) Only 1 chain in asym. unit: (28,803) Oligomeric state: 1 (21,193) Number of Entities: 1 (3,517) Homologue Removal @ 95% identity (2,410) DOOPS: 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues 18 16 14 12 10 8 6 4 2 0 1 1001 2001 a database of protein singles Swiss-Prot: 540,958 proteins in the dataset (192 Maa) DOOPS: 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues 18 16 14 12 10 8 6 4 2 0 01 1001 1000 2000 2001 Dimax analysis of protein cores DOOPS: 2,410 proteins; 4,657,574 atoms; 589,383 residues calculation of % amino acid content in L0 the first quantitative analysis of a large array of protein cores! core aa if Dimax < 0.2 ~20 % of total molecular volume ΣDOOPS aa(L0) = 106,088 (from 2410 proteins) aa % in L0 Alanine Cysteine Aspartate Glutamate Phenylalanine* Phenylalanine Glycine Histidine Isoleucine Lysine Leucina Methionine Asparagine Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine 11.51 2.63 1.77 1.2 6.36 10.81 1.32 11.74 0.58 16.27 2.49 1.7 2.45 1.21 0.83 4.85 4.65 13.7 1.43 2.5 Di analysis of protein cores : folding clues from aa core composition? Class Architectures Topology Homologous superfamily Domains 1 (mainly α) 5 386 875 37,038 2 (mainly β) 20 229 520 43,881 3 (α & β) 14 594 1113 90,029 4 (few sec. str.) 1 104 118 2,588 40 1313 2626 173,536 Total Di analysis of protein cores : folding clues from aa core composition? DOOPS + CATH selected Architectures with ≥ 10 PDB files 1.10 1.20 1.25 1.50 2.10 2.30 2.40 2.60 2.80 3.10 3.20 3.30 3.40 3.60 3.90 total # Proteins mono ( domain ) 213 84 19 10 17 57 94 134 12 84 52 139 218 (84) (40) (17) (3) (13) (37) (73) (110) (12) (73) (44) (106) 203 10 49 1,190 (8) (49) (872) Towards protein folding barcodes % L0 1.10 ALA ARG ASN ASP CYS GLN GLU GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL 1.50 2.10 2.30 2.40 2.60 2.80 13,28 10,32 21,46 12,74 1.20 1.25 9,26 10,05 8,43 9,32 5,5 3.10 3.20 3.30 3.40 3.60 3.90 overall 10,69 10,08 12,58 11,88 14,95 12,01 11.51 0,6 1,28 0,24 1,39 0 0,64 1,72 0,75 0 0,55 1,11 1,75 0,3 0,47 0,95 0.83 0,67 2,62 0,73 2,77 1,85 2,04 1,77 1,36 0 2,1 2,9 0,96 1,52 2,8 2,1 1.70 1,61 2,62 0,24 2,91 1,23 1,27 2,03 1,79 0 2,1 2,9 3,02 1,77 2,34 0,95 1.77 av + 2σ av - σ 3,35 2,99 5,37 0,83 22,84 2,04 1,46 4,42 0,92 2,83 2,1 1,49 1,86 1,4 3,05 2.63 0,6 1,5 0,24 1,11 1,23 1,15 1,81 1,69 0 0,46 1,56 2,15 0,99 1,4 1,33 1.21 1,48 1,44 0,73 1,52 0 1,15 1,19 1,04 0 0,91 2,59 2,41 1,08 0,93 0,67 1.20 8,05 8,72 9,76 13,85 16,05 9,92 16,2 10,82 9,17 8,78 11,81 11,35 12,64 13,08 9,91 10.81 0,79 0,56 0 2,65 1,96 0,47 2,48 1.32 12,8 11,77 12,53 11,53 7,01 11,34 11.74 1,01 1,6 2,44 1,11 0,62 0,76 12,68 9,95 10,73 8,59 6,79 13,61 10,68 10,78 13,76 8,02 17,18 12,97 13,98 33,94 16,54 11,9 14,33 14,22 15,42 13,63 16.27 0,38 0,49 0,56 0 0,09 0,62 1,36 0,55 0 0,67 0.58 23,88 18,34 22,44 11,77 1,91 0,67 0,91 0 1,11 2,62 4,17 1,71 4,99 0 2,8 2,65 3,15 1,83 2,93 2,76 2,41 2,39 3,27 1,91 2.49 6,44 6,79 2,93 4,57 4,32 7,12 7,06 6,73 15,6 7,22 4,95 6,18 6,07 4,21 6,01 6.36 1,34 2,46 3,41 2,63 3,09 3,31 3 2,78 0 3,29 2,9 1,84 2,25 1,4 1,81 2.45 3,49 4,55 3,66 5,96 3,09 5,34 5,56 5,13 2,75 2,83 5,35 4,43 4,23 6,07 5,34 4.85 2,28 4,81 4,15 7,2 5,56 3,31 5,12 4,47 0,92 3,2 5,22 4,25 4,94 5,14 5,91 4.65 1,01 1,55 0 2,77 3,7 0,38 1,63 2,78 2,75 2,19 1,52 0,66 1,26 0,47 2,1 1.43 2,62 3,69 0,24 4,57 2,47 1,27 2,69 4,38 0,92 3,29 3,12 1,58 2,32 0 2,29 2.50 12,34 9,68 9,51 7,62 9,88 16,28 12,75 13,51 11,93 14,53 12,88 11,7 16,29 19,16 15,54 84 (40) 19 (17) 10 (3) 17 (13) 213 #PDB (84) Ala 0 3,02 57 (37) Cys 94 (73) 134 (110) 12 (12) 84 (73) Leu Phe 52 (44) CATH-ADAPT alpha ribbon horseshoe trefoil 139 (106) 218 203 10 (8) 3CKC(A02) PDB ID 1RG8(A00) av + σ av - 2σ Di of 173,536 CATH domains 28 h, 5’ (average comp. time 1.72 s/domain) Calculations performed on 6 cores 990X CPU based computer 13.7 49 2,410 (49) Val four layer sandwich PDB ID CATH - atom depth assisted protein tomography 2IMH(A01) PDB ID PDB ID 1UZK(A01) aa % average value (av) Class Architectures Topology Homologous superfamily 1 5 386 875 2 20 229 520 3 14 594 1113 4 1 104 118 Total 40 1313 2626 Towards protein folding barcodes Putting the protein universe in order Towards protein folding barcodes Putting the protein universe in order towards i-SADIC (implemented SADIC) towards i-SADIC (implemented SADIC) H/D exchange rate profiles towards i-SADIC (implemented SADIC) H/D exchange rate profiles towards i-SADIC (implemented SADIC) H/D exchange rate profiles towards i-SADIC (implemented SADIC) H/D exchange rate profiles towards i-SADIC (implemented SADIC) H/D exchange rate profiles H/D exchange rate profiles 2D atom depth dnwi = or atom distance with the nearest water molecule or 3D atom depth Di,9 = or atom depth index with a probe od radius 9 Å data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol. 1993 230(2):651-660. H/D exchange rate profiles iSADIC atom depth iDi,9 = aDi,9 + bASAi cDi,9 + dDnwi 3D atom depth Di,9 = or atom depth index with a probe od radius 9 Å data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol. 1993 230(2):651-660. H/D exchange rate profiles iSADIC atom depth iDi,9 = aDi,9 + bASAi cDi,9 + dDnwi 3D atom depth protein-protein interface analysis biological vs crystallographic interfaces N CA C O CB CG CD NE CZ NH1 NH2 H HA HB2 HB3 HG2 HG3 HD2 HD3 HE HH11 HH12 HH21 HH22 ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG vs N CA C O CB CG CD CE NZ H HA HB2 HB3 HG2 HG3 HD2 HD3 HE2 HE3 HZ1 HZ2 HZ3 LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS LYS