An analysis of pdb-care (PDB CArbohydrate REsidue check): a program to support annotation of complex carbohydrate structures in PDB files by Thomas Lütteke and Claus-W von der Lieth By David Chapman Background Protein Data Bank includes 3-D data for carbohydrate structures as well as amino acid structures 3-D data for protein / carbohydrate interactions is analyzed through X-Ray crytallography and Nuclear Magnetic Resonance The absence of 3-D glycan data in PDB does not necessarily mean a potential glycosolation site is unoccupied Background The crytallography may have been done on plasmid replicated proteins, which may not have the same carbohydrates attached as the human form. Glycosylation usually occurs at asparagine residues in Asn-X-Ser/Thr sequons where X does not equal proline Approximately 30% of all 1663 PDB entries (Sep 2003) containing carbohydrates contain errors in glycan description Biological Significance Protein / Carbohydrate interactions are important because they are involved in a variety of biological processes Fertilization Embryonic development Cellular differentiation Background High error rate in PDB glycan description is mainly due to incorrect assignment of saccharide units Sequences for complex carbohydrates differ significantly from single letter amino acid sequences The number of naturally occurring residues is much larger for carbohydrates Each pair of monosaccharide residues can be linked in several ways A residue can be connected to three or four others (branching) Background Unlike amino acids, carbohydrates use a three letter code which are defined the HET dictionary in PDB A new residue name is required for each stereochemically different sugar unit This makes the correct assignment complicated, tedious and error prone Background Examples of Definitions of carbohydrate residues: AGC alpha-D-Glucopyranose BGC beta -D-Glucopyranose FCA alpha-D-Fucose FCB beta-D-Fucose There are more than 200 carbohydrate residues used in PDB Implementation Pdb-care is based on the pdb2linucs carbohydrate detection program Pdb2linucs is able to identify and assign carbohydrate structures using only the reported atom types and their 3D coordinates The program output is in LINUCS notation and is used to normalize complex carbohydrate structures Pdb-care uses a translation table built in XML in order to compare the LINUCS notation from pdb2linucs to the residue assignments in the PDB group dictionary Implementation The translation table contains: 141 monosaccharides 31 oligosaccharides 77 combined residues Pdb-care was written in the C language Front end is a web interface implemented in PHP Implementation Pdb-care web interface can accommodate either direct input using copy/paste of a pdb file or locating a file on a local hard drive or using a PDB-ID The pdb-care protocol reports the type of problems, inconsistencies and errors detected Program Example pdb-care examples Conclusion The authors made relevant points regarding the biological significance of protein-carbohydrate interactions and the need for accurate glycan residue information in PDB. However, the authors did not go into detail regarding the actual implementation of the translation table used in pdb-care so it is difficult to judge the accuracy of their program.