Overview of Problems with Carbohydrates in the PDB

advertisement
Overview of Problems with
Carbohydrates in the PDB
“...while the functions of DNA and
proteins are generally known.....it is much
less clear what carbohydrates do...”
Ciba Foundation Symposium 1988
A lesson in doing this project
Performance, Feedback, Revision
http://the273.com/2011/05/24/baba-brinkman-performance-feedback-revision-video/
Link provided by Helen Berman
Priorities change
No point you have had this
Major part of PDB but not that interesting
Most interesting chemistry
Important to understand first
Every step you do changes the next steps to be done
New Schedule
Carbohydrates and the PDB
Natural Product Carbohydrates
N- and O-Glycans
Dont know – see what is appropriate
How Much more Complex is the Glycome of an
organism in Comparison with its Genome?
GLYCANS
(SUGAR CHAINS)
Proteome
Genome
DNA
RNA
PROTEINS
Glycome
ENZYMES
Zymome?
Transcriptome
LIPIDS
Lipome
Variations in structure, time
and space.
Changes in response to
environment
Diversity of structures, Information
carrying potential
Laine, RA (1994) “A Calculation of all Possible
Oligosaccharide Isomers, Both Branched and
Linear Yields 1.05 x 1012 Structures for a
Reducing Hexasaccharide: The Isomer Barrier
to Development of Single-Method Saccharide
Sequencing or Synthesis Systems”
Glycobiology 4:759-767.
Proteins
Polysaccharides
well defined
Often poorly defined
Coded precisely by genes
Synthesised by enzymes without
template
monodisperse
polydisperse, and generally larger
~20 building block residues
Many homopolymers, and rarely >3,4
different residues
Standard peptide link
Various links a(11), a(12), a(14),a(16), b(13), b(14)etc
Normally tightly folded structures
Range of structures (rodcoil)
some proteins do not possess folded
structure – gelatin
Poly(amino acid) ~ compares with
some linear polysaccharides
General Characteristics
In nature, most carbohydrates are found bound
to other compounds rather than as simple
sugars
Polysaccharides (starch, cellulose,
inulin, gums)
Glycoproteins and proteoglycans
(hormones, blood group substances,
antibodies)
Glycolipids (cerebrosides, gangliosides)
Glycosides
Mucopolysaccharides (hyaluronic acid)
Nucleic acid polymers
Classification of Carbohydrates
Carbohydrates can be classified by size:
 Monosaccharides (monoses or glycoses)
Trioses, tetroses, pentoses, hexoses
 Oligosaccharides
Di, tri, tetra, penta …up to 10
(The disaccharides are the most important)
 Polysaccharides (or glycans)
Homopolysaccharides (all the same type)
Heteropolysaccharides (mixtures of
momomer types)
Complex carbohydrates (joined to noncarbohydrate molecules)
Derivatives of monosaccharides
with biological activities:
1.
2.
3.
4.
5.
6.
7.
8.
Phosphate and sulphate esters
Alditols
Aldonic and uronic acids
Deoxysugars
Aminosugars
Family of sialic acids
N-acetylmuraminic acid
Glycosides
What are you searching ?
June 27th 2011
Number PDB entries
73951
Number chem_comp
14206
132117 HETNAM in pdb files
Number chem_comp in HETNAM 12111
Number chem_comp Released 12289
Number chem_comp Hold
1363
Number chem_comp Obsolete
381
sum 14033
Number in REMOVED list
381 (REMOVED not equal obs)
351 chem_comp missing from num in PDB + OBS + HOLD
must be number in remediation either new or obs
from antibiotics/inhibitors
Searching chem_comp in isolation of the PDB entries not
recommended – check if a chem_comp exists and the LINK records to
see if the instance was built correctly
Note: after remediation release 100’s chem_comp change status
The majority of potential chemical entities
in PDB exist in a small number of Entries
8074 chem_comp appear in 1 entries
1554 chem_comp appear in 2 entries
628 chem_comp appear in 3 entries
365 chem_comp appear in 4 entries
226 chem_comp appear in 5 entries
167 chem_comp appear in 6 entries
124 chem_comp appear in 7 entries
98 chem_comp appear in 8 entries
73 chem_comp appear in 9 entries
678 chem_comp appear in 10 to 100 entries
47 chem_comp appear in 101 to 200 entries
For uncommon groups check the PDB entry !!!!
Top 77 chem_comp by count of released pdb entries
Includes 8 sugars
SO4
8761
MN
1541
PEG
609
FUC
437
FE2
339
IPA
258
ZN
6625
FAD
1102
NH2
600
SAH
435
EPE
339
CSO
257
MG
6096
K
1099
PLP
600
NDG
429
ANP
329
COA
246
MSE
5909
ADP
974
CD
572
CIT
427
DMS
324
KCX
237
GOL
5664
MAN
940
ATP
562
PG4
423
TPO
322
BOG
236
CA
5575
FE
847
NI
541
GLC
401
AMP
309
CMO
234
CL
4806
ACE
760
FMN
538
GAL
399
NDP
309
LLP
227
NA
3101
NAD
756
ACY
516
SEP
387
IOD
305
GTP
222
NAG
2697
BMA
738
TRS
505
PCA
386
IMD
291
HEC
221
HEM
2571
CU
711
GDP
493
BGC
377
UNX
280
UNL
218
PO4
2546
BME
649
FMT
480
CO
376
PGE
276
CO3
214
EDO
2236
NAP
644
MES
473
FES
359
PTR
271
MRD
207
ACT
1801
MPD
634
SF4
442
HG
354
NO3
263
NDG and NAG
Common error in N-linked N-acetyl-D-glucosamine
attached to asparagine
There are 429 cases (it was ~200 in May 2007 so the
annotation/ deposition is still not alerting the depositor)
for which we have had to assign, by stereochemistry
matching,
the incorrect
2-(acetylamino)- 2-deoxy-α-D- glucopyranose (NDG)
rather than the correct
2-( acetylamino)-2-deoxy-β-D- glucopyranose (NAG).
Asn---NAG is ALWAYS beta- never alpha
Deposition of PDB Entries
Refinement programs use geometric restraints as part
of refinement. For protein structures accurate bond and
angle parameters are based on parameters derived from
a statistical survey of X-ray structures of small
compounds from the Cambridge Structural Database.
(R. A. Engh and R. Huber). Other restraints for proteins,
nucleic acids, and other common molecules come from
the CCP4 monomer library.
Deposition of PDB Entries
These restraints are used in refinement to prevent
distortions of model geometry, and to increase the
observation-to-parameter ratio. The default
restraints are for bond lengths, bond angles,
dihedral (torsion) angles, chiral centers, planar
groups (such as aromatic rings), and nonbonded
(VDW) interactions.
Refinement Restraints for Carbohydrates
Although geometry restraints for carbohydrates exist
they are not always used with the result that there are
geometry errors in deposited files.
Many of the stereochemical errors can be detected by
reference to conformational studies of glycans and to
publicly available resources
(http://www. glycosciences.de/tools/).
However, these errors also indicate that there is a wide
discrepancy in the sophistication of building and
validation tools available for protein and carbohydrate
models.
PDB does not contain N-linked Glycan unknown to
glycobiology – resources that depositors should use:
http://www.glycome-db.org/
http://www.glycostructures.jp/
http://www.cbs.dtu.dk/databases/OGLYCBASE/
http://www.glycoforum.gr.jp/
http://www.genome.jp/ligand/kcam/
http://www.functionalglycomics.org/static/index.shtml
http://www.glyco.ac.ru/bcsdb3/
http://www.casper.organ.su.se/ECODAB/
http://www.functionalglycomics.org/static/gt/gtdb.shtml
http://akashia.sci.hokudai.ac.jp/
http://hexose.chem.ku.edu/sugar.php
http://www.eurocarbdb.org/
http://glyco3d.cermav.cnrs.fr/glyco3d/
http://www.glycosciences.de/modeling/sweet2/doc/index.php
First of all
A GOOD carbohydrate
PDB 1qbb
Di-(N-Acetyl-D-glucosamine)
NOTE role of
aromatic amino acid
side chains in
controlling
stereochemical
selection
NOTE also two
positions for
reducing end O
atom – under
PDB rules this
would be 2
chem_comps but
here alpha- and
beta- same
compound
Crystallographic Inventions
Man-(1→3)-GlcNAc and GlcNAc- (1→3)-GlcNAc
linkages (of indeterminate anomericity) within the
trimannosyl core,
hybrid-type glycans containing a terminal
Man-(1→3)-GlcNAc linkage on the 3- antennae
β-galactosyl motifs capping oligomannose-type
glycans.
Entry 2H6O
Crystallographic Inventions
The pilin glycans from Neisseria species share a
common structure, in particular with respect to the
unusual O-linked sugar residue
2,4-diacetamido- 2,4,6- trideoxyhexose (DATDH)
However, in the PDB (1AY2, 2PIL) , the pilin structure
from Neisseria gonorrhoeae show a
galactose-α-1,3-N- acetylglucosamine- serine
In later PDB (2HI2, 2HIL), the correct sugar,
2,4-bis(acetylamino)-1,5- anhydro-2,4-dideoxy-d-glucitol,
is reported O-linked to serine.
1AY2
incorrect
2HI2
correct
PROBLEM 1 D- vs L- Designation
D & L sugars are mirror
images of one another
They have the same root
name (but a different
D/L designation),
[e.g. D-glucose
& L-glucose]
Other stereoisomers
have unique names,
(e.g. glucose, mannose,
galactose, etc)
O
H
C
H– C– OH
HO– C– H
H– C– OH
H– C– OH
CH2OH
O
H
C
HO– C– H
H– C– OH
HO– C– H
HO– C– H
CH2OH
D-glucose
L-glucose
The number of stereoisomers is 2n, where n is the
number of asymmetric (chiral) centers
The 6-C aldoses have 4 asymmetric centers. Thus there
are 16 stereoisomers (8 D-sugars and 8 L-sugars).
D and L tell you nothing about stereochemistry
The result is authors who refine with a standard e.g.
mannose and a linkage or alpha- / beta- C1-OH
patch don’t necessarily deposit the PDB required
chem_comp name for alpha-Mannose (MAN) or
beta-Mannose (BMA).
If you used R and S per chiral centre no chemist will
understand that you are describing a sugar but the
stereochemistry will be exactly defined and
mistakes avoided
PROBLEM 2 alpha- beta at C1
6 CH2OH
6 CH2OH
5
H
4
OH
O
H
OH
3
H
H
2
OH
a-D-glucose
H
1
OH
5
H
4
OH
H
OH
3
H
O
OH
H
1
2
OH
b-D-glucose
Cyclization of glucose produces a new
asymmetric center at C1. The 2 stereoisomers
are called anomers, a & b
H
Chem_Comp LEAVING ATOM
The PDB has rules to include LINKAGE in 3-letter
code
Refinement (suppliers of coordinates to PDB) use
“patches” to describe alpha- and beta- NOT 3-letter
code
Systematic conventions of representing sugars
don’t rename alpha-Mannose and beta-Mannose to
MAN and BMA as PDB does
A NAG-FUC in PDB
Alpha-L-Fucose
Beta-?
This is PDB FUL
Beta-L-Fucose
Which doesn’t exist
In glycans
The process of identifying a new chem_comp in
a PDB entry
1.
2.
3.
4.
5.
6.
7.
8.
Find all atoms belonging to a single entity
Detect bond orders by software
Add appropriate H-atoms
Generate a SMILES
Test if SMILES generate correct ideal coordinates
From ideal coordinates generate a SMILES
From SMILES generate chemical Name
Chem_comp CIF file stores the output, it is not used
as input in any step
Identifying an existing chem_comp in a PDB
entry
1.The chem_comp connectivity is extracted and a graph
made for each compound
2.As above – all atoms belonging to a chemical entity
are found and its connectivity graph compared to
dictionary to find correct match
3.Crucial step is finding LINKed atoms that may belong
to the entity in question – in carbohydrates in PDB in a
Glycosidic Bond C1(i) --- X(i+1) the Oxygen of C1(i) is
named in the (i+1) residue but in identification it is
attached “temporarily” to the sugar to determine the C1
stereochemistry so in an Asn-NAG – O1 is the Asn
Nitrogen atom
LEAVING ATOM – frequent problem
Asn-NAG N to C
is > 2.0 A – would end
up as 5AX, the dehydroxy NAG at C1
(plus angle is impossible)
LEAVING ATOM & alpha- beta- linkage
Note this is similar to the peptide bond in proteins, but
the leaving atom is assumed in all protein software
and the LINK is independent of the 3-letter code, i.e.
you can have a cis or trans peptide and trans is
assumed while cis is given external to the residue
name as CISPEP
All glycobiology gives the sugar linkage and C1
stereochemistry external to the sugar name – only the
PDB has BMA and MAN to represent beta-mannose
and alpha-mannose – everywhere else mannose is
mannose “man”. All refinement software (the
suppliers to the PDB) use MAN and a link “patch”.
Historical legacy we could do without !!!!
PROBLEM 3 – Conformation (minor)
H OH
H OH
4 6
H O
HO
HO
H O
HO
H
HO
5
3
H
H
2
H
OH 1
OH
a-D-glucopyranose
H
OH
OH
H
b-D-glucopyranose
Because of the tetrahedral nature of carbon
bonds, pyranose sugars actually assume a
"chair" or "boat" configuration, depending on
the sugar
Conformational
formulas of
pyranoses
Conformation
Sugar ring pucker not always fitted well to density
This does not interfere with identification
Except where bond lengths and angles may cause
processing software to confuse single and double
bonds
The conformation of the ring is dominated by steric
interactions between axial groups. In hexopyranoses
this causes a strong preference for the less crowded
4C1 conformation in the D-series (1C4 in the Lseries) as this places C-6 in an equatorial position.
In pentoses, furanoses and unsaturated pyranoses
the differences in steric energy between
conformations are much smaller so that the
conformation is often determined by the anomeric
effect.
The term anomeric effect is used to describe the
preference for placing electronegative substituents
anti to the electron pair of a heteroatom, i.e. oxygen.
But the debate of the ribose ring pucker in dna and rna
may have ceased it is not resolved
PROBLEM 4 Glycosidic Bonds
Glycoprotein carbohydrate moieties
are inherently:
(a) Variable:
Variable site occupancy
Variable structures at each
site
(b) Flexible
B7-1
These are exactly why
glycosylation is avoided in
constructs for crystallisation!
Glycosidic Bonds
The anomeric hydroxyl and a hydroxyl of another sugar or
some other compound can join together, splitting out
water to form a glycosidic bond:
R-OH + HO-R' -> R-O-R' + H2O
E.g., methanol reacts with the anomeric OH on glucose to
form methyl glucoside (methyl-glucopyranose).
H OH
H OH
H2 O
H O
HO
HO
H
H
H
+
CH3-OH
H O
HO
HO
H
OH
H
OH
a-D-glucopyranose
methanol
H
OH
OCH3
methyl-a-D-glucopyranose
Glycosidic bonds determine structure
Straight chains,
good for structure
Bent chains,
good for storage
40
Both glycosides and oligo-/polysacharides
are built of compounds linked by glycoside bond
Glycosides
Molecule (non-sugar)
with free –OH or -NH2
groups
(aglycone)
Monosaccharide
with free
Oligosaccharide
-OH at C1
Polysaccharide
Monosaccharide
Monosaccharide
A disaccharide
Chemical structure – submitted to correct??
Major problem in PDB that authors will
always check UniProt for correct amino acid
sequence and GenBank for correct DNA/RNA
sequences
but never check if the sugars built into
density actually exist for species understudy
either extracted source or expressed source
PROBLEM 5
Structural Features:
H-bonding opportunities
Cellulose: H-bonds add strength
Secondary & Tertiary Structure
 Rotational
freedom
 hydrogen bonding
 oscillations
 local (secondary) and overall
(tertiary) random coil, helical
conformations
Movement around bonds:
from:
http://www.sbu.ac.uk
/water/hydro.html
Frequently used definitions of
glycosidic torsion angles
Angle
NMR style
C−1
crystallographic
style
C+1
crystallographic
style
ϕ
H1—C1—O—C′x
O5—C1—O—C′x
O5—C1—O—C′x
ψ
C1—O—C′x—H′x
C1—O—C′x—C′x−1
C1—O—C′x—C′x+1
ψ [(1–6)-linkage]
C1—O—C′6—C′5
C1—O—C′6—C′5
C1—O—C′6—C′5
ω [(1–6)-linkage]
O—C′6—C′5—H′5
O—C′6—C′5—C′4
O—C′6—C′5—O′5
ASN
Well in modelling
If not crystallography
Polysaccharide equivalents to phi/psi in
proteins are not used
Proteins are routinely without question
validated for allowed phi/psi torsion
angles
Polysaccharides have a wider range of
allowed torsion angles but there are clear
preferences – all universally ignored
Tertiary structure - sterical/geometrical
conformations
Rule-of-thumb: Overall shape of the chain is
determined by geometrical relationship
within each monosaccharide unit
 b(14) - zig-zag - ribbon like
 b(1 3) & a(14) - U-turn - hollow helix
 b(1 2) - twisted - crumpled
 (16) - no ordered conformation
Assignment for next lecture
Today has been a general view of sugars in the PDB
For next week Find all instances of the following 4 example groups of
sugar compounds
Caution: The compounds may be given as a single 3letter code or as a LINKed set of chem_comp’s
Find common name and if a natural metabolite what is
the organism source
EXCLUDE all phosphate and nucleotide examples
Group 1
Sugar(s) attached
to ring system
(from 10 to 20
membered ring)
Macrolactone,
Polyketide Antibiotics
Not all ring
systems with
sugars attached
are Macrolydes
Group 2
Glycosylated
relatives of
anthracycline
family that is
given as a
treatment for
some types of
cancer e.g.
Daunomycin
Group 3
Clue look at pdb 1qff
Any thing that looks
vaguely like a
Lipopolysaccharide
Group 4
Look for textual searching
for anything that could be a
Blood Group Antigen
NOTE: Usually linked monosaccharides so no
COMPND/MOLECULE name
[BUT not always e.g. look at
DR3]
THE LEWIS B HUMAN BLOOD
GROUP DETERMINANT
and TITLE may be
misleading as for a related
series of PDB entries may
be replicated to say
complex with a blood group
even for apo structures
Download