Overview of Problems with Carbohydrates in the PDB “...while the functions of DNA and proteins are generally known.....it is much less clear what carbohydrates do...” Ciba Foundation Symposium 1988 A lesson in doing this project Performance, Feedback, Revision http://the273.com/2011/05/24/baba-brinkman-performance-feedback-revision-video/ Link provided by Helen Berman Priorities change No point you have had this Major part of PDB but not that interesting Most interesting chemistry Important to understand first Every step you do changes the next steps to be done New Schedule Carbohydrates and the PDB Natural Product Carbohydrates N- and O-Glycans Dont know – see what is appropriate How Much more Complex is the Glycome of an organism in Comparison with its Genome? GLYCANS (SUGAR CHAINS) Proteome Genome DNA RNA PROTEINS Glycome ENZYMES Zymome? Transcriptome LIPIDS Lipome Variations in structure, time and space. Changes in response to environment Diversity of structures, Information carrying potential Laine, RA (1994) “A Calculation of all Possible Oligosaccharide Isomers, Both Branched and Linear Yields 1.05 x 1012 Structures for a Reducing Hexasaccharide: The Isomer Barrier to Development of Single-Method Saccharide Sequencing or Synthesis Systems” Glycobiology 4:759-767. Proteins Polysaccharides well defined Often poorly defined Coded precisely by genes Synthesised by enzymes without template monodisperse polydisperse, and generally larger ~20 building block residues Many homopolymers, and rarely >3,4 different residues Standard peptide link Various links a(11), a(12), a(14),a(16), b(13), b(14)etc Normally tightly folded structures Range of structures (rodcoil) some proteins do not possess folded structure – gelatin Poly(amino acid) ~ compares with some linear polysaccharides General Characteristics In nature, most carbohydrates are found bound to other compounds rather than as simple sugars Polysaccharides (starch, cellulose, inulin, gums) Glycoproteins and proteoglycans (hormones, blood group substances, antibodies) Glycolipids (cerebrosides, gangliosides) Glycosides Mucopolysaccharides (hyaluronic acid) Nucleic acid polymers Classification of Carbohydrates Carbohydrates can be classified by size: Monosaccharides (monoses or glycoses) Trioses, tetroses, pentoses, hexoses Oligosaccharides Di, tri, tetra, penta …up to 10 (The disaccharides are the most important) Polysaccharides (or glycans) Homopolysaccharides (all the same type) Heteropolysaccharides (mixtures of momomer types) Complex carbohydrates (joined to noncarbohydrate molecules) Derivatives of monosaccharides with biological activities: 1. 2. 3. 4. 5. 6. 7. 8. Phosphate and sulphate esters Alditols Aldonic and uronic acids Deoxysugars Aminosugars Family of sialic acids N-acetylmuraminic acid Glycosides What are you searching ? June 27th 2011 Number PDB entries 73951 Number chem_comp 14206 132117 HETNAM in pdb files Number chem_comp in HETNAM 12111 Number chem_comp Released 12289 Number chem_comp Hold 1363 Number chem_comp Obsolete 381 sum 14033 Number in REMOVED list 381 (REMOVED not equal obs) 351 chem_comp missing from num in PDB + OBS + HOLD must be number in remediation either new or obs from antibiotics/inhibitors Searching chem_comp in isolation of the PDB entries not recommended – check if a chem_comp exists and the LINK records to see if the instance was built correctly Note: after remediation release 100’s chem_comp change status The majority of potential chemical entities in PDB exist in a small number of Entries 8074 chem_comp appear in 1 entries 1554 chem_comp appear in 2 entries 628 chem_comp appear in 3 entries 365 chem_comp appear in 4 entries 226 chem_comp appear in 5 entries 167 chem_comp appear in 6 entries 124 chem_comp appear in 7 entries 98 chem_comp appear in 8 entries 73 chem_comp appear in 9 entries 678 chem_comp appear in 10 to 100 entries 47 chem_comp appear in 101 to 200 entries For uncommon groups check the PDB entry !!!! Top 77 chem_comp by count of released pdb entries Includes 8 sugars SO4 8761 MN 1541 PEG 609 FUC 437 FE2 339 IPA 258 ZN 6625 FAD 1102 NH2 600 SAH 435 EPE 339 CSO 257 MG 6096 K 1099 PLP 600 NDG 429 ANP 329 COA 246 MSE 5909 ADP 974 CD 572 CIT 427 DMS 324 KCX 237 GOL 5664 MAN 940 ATP 562 PG4 423 TPO 322 BOG 236 CA 5575 FE 847 NI 541 GLC 401 AMP 309 CMO 234 CL 4806 ACE 760 FMN 538 GAL 399 NDP 309 LLP 227 NA 3101 NAD 756 ACY 516 SEP 387 IOD 305 GTP 222 NAG 2697 BMA 738 TRS 505 PCA 386 IMD 291 HEC 221 HEM 2571 CU 711 GDP 493 BGC 377 UNX 280 UNL 218 PO4 2546 BME 649 FMT 480 CO 376 PGE 276 CO3 214 EDO 2236 NAP 644 MES 473 FES 359 PTR 271 MRD 207 ACT 1801 MPD 634 SF4 442 HG 354 NO3 263 NDG and NAG Common error in N-linked N-acetyl-D-glucosamine attached to asparagine There are 429 cases (it was ~200 in May 2007 so the annotation/ deposition is still not alerting the depositor) for which we have had to assign, by stereochemistry matching, the incorrect 2-(acetylamino)- 2-deoxy-α-D- glucopyranose (NDG) rather than the correct 2-( acetylamino)-2-deoxy-β-D- glucopyranose (NAG). Asn---NAG is ALWAYS beta- never alpha Deposition of PDB Entries Refinement programs use geometric restraints as part of refinement. For protein structures accurate bond and angle parameters are based on parameters derived from a statistical survey of X-ray structures of small compounds from the Cambridge Structural Database. (R. A. Engh and R. Huber). Other restraints for proteins, nucleic acids, and other common molecules come from the CCP4 monomer library. Deposition of PDB Entries These restraints are used in refinement to prevent distortions of model geometry, and to increase the observation-to-parameter ratio. The default restraints are for bond lengths, bond angles, dihedral (torsion) angles, chiral centers, planar groups (such as aromatic rings), and nonbonded (VDW) interactions. Refinement Restraints for Carbohydrates Although geometry restraints for carbohydrates exist they are not always used with the result that there are geometry errors in deposited files. Many of the stereochemical errors can be detected by reference to conformational studies of glycans and to publicly available resources (http://www. glycosciences.de/tools/). However, these errors also indicate that there is a wide discrepancy in the sophistication of building and validation tools available for protein and carbohydrate models. PDB does not contain N-linked Glycan unknown to glycobiology – resources that depositors should use: http://www.glycome-db.org/ http://www.glycostructures.jp/ http://www.cbs.dtu.dk/databases/OGLYCBASE/ http://www.glycoforum.gr.jp/ http://www.genome.jp/ligand/kcam/ http://www.functionalglycomics.org/static/index.shtml http://www.glyco.ac.ru/bcsdb3/ http://www.casper.organ.su.se/ECODAB/ http://www.functionalglycomics.org/static/gt/gtdb.shtml http://akashia.sci.hokudai.ac.jp/ http://hexose.chem.ku.edu/sugar.php http://www.eurocarbdb.org/ http://glyco3d.cermav.cnrs.fr/glyco3d/ http://www.glycosciences.de/modeling/sweet2/doc/index.php First of all A GOOD carbohydrate PDB 1qbb Di-(N-Acetyl-D-glucosamine) NOTE role of aromatic amino acid side chains in controlling stereochemical selection NOTE also two positions for reducing end O atom – under PDB rules this would be 2 chem_comps but here alpha- and beta- same compound Crystallographic Inventions Man-(1→3)-GlcNAc and GlcNAc- (1→3)-GlcNAc linkages (of indeterminate anomericity) within the trimannosyl core, hybrid-type glycans containing a terminal Man-(1→3)-GlcNAc linkage on the 3- antennae β-galactosyl motifs capping oligomannose-type glycans. Entry 2H6O Crystallographic Inventions The pilin glycans from Neisseria species share a common structure, in particular with respect to the unusual O-linked sugar residue 2,4-diacetamido- 2,4,6- trideoxyhexose (DATDH) However, in the PDB (1AY2, 2PIL) , the pilin structure from Neisseria gonorrhoeae show a galactose-α-1,3-N- acetylglucosamine- serine In later PDB (2HI2, 2HIL), the correct sugar, 2,4-bis(acetylamino)-1,5- anhydro-2,4-dideoxy-d-glucitol, is reported O-linked to serine. 1AY2 incorrect 2HI2 correct PROBLEM 1 D- vs L- Designation D & L sugars are mirror images of one another They have the same root name (but a different D/L designation), [e.g. D-glucose & L-glucose] Other stereoisomers have unique names, (e.g. glucose, mannose, galactose, etc) O H C H– C– OH HO– C– H H– C– OH H– C– OH CH2OH O H C HO– C– H H– C– OH HO– C– H HO– C– H CH2OH D-glucose L-glucose The number of stereoisomers is 2n, where n is the number of asymmetric (chiral) centers The 6-C aldoses have 4 asymmetric centers. Thus there are 16 stereoisomers (8 D-sugars and 8 L-sugars). D and L tell you nothing about stereochemistry The result is authors who refine with a standard e.g. mannose and a linkage or alpha- / beta- C1-OH patch don’t necessarily deposit the PDB required chem_comp name for alpha-Mannose (MAN) or beta-Mannose (BMA). If you used R and S per chiral centre no chemist will understand that you are describing a sugar but the stereochemistry will be exactly defined and mistakes avoided PROBLEM 2 alpha- beta at C1 6 CH2OH 6 CH2OH 5 H 4 OH O H OH 3 H H 2 OH a-D-glucose H 1 OH 5 H 4 OH H OH 3 H O OH H 1 2 OH b-D-glucose Cyclization of glucose produces a new asymmetric center at C1. The 2 stereoisomers are called anomers, a & b H Chem_Comp LEAVING ATOM The PDB has rules to include LINKAGE in 3-letter code Refinement (suppliers of coordinates to PDB) use “patches” to describe alpha- and beta- NOT 3-letter code Systematic conventions of representing sugars don’t rename alpha-Mannose and beta-Mannose to MAN and BMA as PDB does A NAG-FUC in PDB Alpha-L-Fucose Beta-? This is PDB FUL Beta-L-Fucose Which doesn’t exist In glycans The process of identifying a new chem_comp in a PDB entry 1. 2. 3. 4. 5. 6. 7. 8. Find all atoms belonging to a single entity Detect bond orders by software Add appropriate H-atoms Generate a SMILES Test if SMILES generate correct ideal coordinates From ideal coordinates generate a SMILES From SMILES generate chemical Name Chem_comp CIF file stores the output, it is not used as input in any step Identifying an existing chem_comp in a PDB entry 1.The chem_comp connectivity is extracted and a graph made for each compound 2.As above – all atoms belonging to a chemical entity are found and its connectivity graph compared to dictionary to find correct match 3.Crucial step is finding LINKed atoms that may belong to the entity in question – in carbohydrates in PDB in a Glycosidic Bond C1(i) --- X(i+1) the Oxygen of C1(i) is named in the (i+1) residue but in identification it is attached “temporarily” to the sugar to determine the C1 stereochemistry so in an Asn-NAG – O1 is the Asn Nitrogen atom LEAVING ATOM – frequent problem Asn-NAG N to C is > 2.0 A – would end up as 5AX, the dehydroxy NAG at C1 (plus angle is impossible) LEAVING ATOM & alpha- beta- linkage Note this is similar to the peptide bond in proteins, but the leaving atom is assumed in all protein software and the LINK is independent of the 3-letter code, i.e. you can have a cis or trans peptide and trans is assumed while cis is given external to the residue name as CISPEP All glycobiology gives the sugar linkage and C1 stereochemistry external to the sugar name – only the PDB has BMA and MAN to represent beta-mannose and alpha-mannose – everywhere else mannose is mannose “man”. All refinement software (the suppliers to the PDB) use MAN and a link “patch”. Historical legacy we could do without !!!! PROBLEM 3 – Conformation (minor) H OH H OH 4 6 H O HO HO H O HO H HO 5 3 H H 2 H OH 1 OH a-D-glucopyranose H OH OH H b-D-glucopyranose Because of the tetrahedral nature of carbon bonds, pyranose sugars actually assume a "chair" or "boat" configuration, depending on the sugar Conformational formulas of pyranoses Conformation Sugar ring pucker not always fitted well to density This does not interfere with identification Except where bond lengths and angles may cause processing software to confuse single and double bonds The conformation of the ring is dominated by steric interactions between axial groups. In hexopyranoses this causes a strong preference for the less crowded 4C1 conformation in the D-series (1C4 in the Lseries) as this places C-6 in an equatorial position. In pentoses, furanoses and unsaturated pyranoses the differences in steric energy between conformations are much smaller so that the conformation is often determined by the anomeric effect. The term anomeric effect is used to describe the preference for placing electronegative substituents anti to the electron pair of a heteroatom, i.e. oxygen. But the debate of the ribose ring pucker in dna and rna may have ceased it is not resolved PROBLEM 4 Glycosidic Bonds Glycoprotein carbohydrate moieties are inherently: (a) Variable: Variable site occupancy Variable structures at each site (b) Flexible B7-1 These are exactly why glycosylation is avoided in constructs for crystallisation! Glycosidic Bonds The anomeric hydroxyl and a hydroxyl of another sugar or some other compound can join together, splitting out water to form a glycosidic bond: R-OH + HO-R' -> R-O-R' + H2O E.g., methanol reacts with the anomeric OH on glucose to form methyl glucoside (methyl-glucopyranose). H OH H OH H2 O H O HO HO H H H + CH3-OH H O HO HO H OH H OH a-D-glucopyranose methanol H OH OCH3 methyl-a-D-glucopyranose Glycosidic bonds determine structure Straight chains, good for structure Bent chains, good for storage 40 Both glycosides and oligo-/polysacharides are built of compounds linked by glycoside bond Glycosides Molecule (non-sugar) with free –OH or -NH2 groups (aglycone) Monosaccharide with free Oligosaccharide -OH at C1 Polysaccharide Monosaccharide Monosaccharide A disaccharide Chemical structure – submitted to correct?? Major problem in PDB that authors will always check UniProt for correct amino acid sequence and GenBank for correct DNA/RNA sequences but never check if the sugars built into density actually exist for species understudy either extracted source or expressed source PROBLEM 5 Structural Features: H-bonding opportunities Cellulose: H-bonds add strength Secondary & Tertiary Structure Rotational freedom hydrogen bonding oscillations local (secondary) and overall (tertiary) random coil, helical conformations Movement around bonds: from: http://www.sbu.ac.uk /water/hydro.html Frequently used definitions of glycosidic torsion angles Angle NMR style C−1 crystallographic style C+1 crystallographic style ϕ H1—C1—O—C′x O5—C1—O—C′x O5—C1—O—C′x ψ C1—O—C′x—H′x C1—O—C′x—C′x−1 C1—O—C′x—C′x+1 ψ [(1–6)-linkage] C1—O—C′6—C′5 C1—O—C′6—C′5 C1—O—C′6—C′5 ω [(1–6)-linkage] O—C′6—C′5—H′5 O—C′6—C′5—C′4 O—C′6—C′5—O′5 ASN Well in modelling If not crystallography Polysaccharide equivalents to phi/psi in proteins are not used Proteins are routinely without question validated for allowed phi/psi torsion angles Polysaccharides have a wider range of allowed torsion angles but there are clear preferences – all universally ignored Tertiary structure - sterical/geometrical conformations Rule-of-thumb: Overall shape of the chain is determined by geometrical relationship within each monosaccharide unit b(14) - zig-zag - ribbon like b(1 3) & a(14) - U-turn - hollow helix b(1 2) - twisted - crumpled (16) - no ordered conformation Assignment for next lecture Today has been a general view of sugars in the PDB For next week Find all instances of the following 4 example groups of sugar compounds Caution: The compounds may be given as a single 3letter code or as a LINKed set of chem_comp’s Find common name and if a natural metabolite what is the organism source EXCLUDE all phosphate and nucleotide examples Group 1 Sugar(s) attached to ring system (from 10 to 20 membered ring) Macrolactone, Polyketide Antibiotics Not all ring systems with sugars attached are Macrolydes Group 2 Glycosylated relatives of anthracycline family that is given as a treatment for some types of cancer e.g. Daunomycin Group 3 Clue look at pdb 1qff Any thing that looks vaguely like a Lipopolysaccharide Group 4 Look for textual searching for anything that could be a Blood Group Antigen NOTE: Usually linked monosaccharides so no COMPND/MOLECULE name [BUT not always e.g. look at DR3] THE LEWIS B HUMAN BLOOD GROUP DETERMINANT and TITLE may be misleading as for a related series of PDB entries may be replicated to say complex with a blood group even for apo structures