A Calculation of all Possible Oligosaccharide Isomers Both

advertisement
A Calculation of all Possible
Oligosaccharide Isomers
Both Branched and Linear Yields
1.05 x 1012 Structures for a Reducing
Hexasaccharide
The Isomer Barrier to Development of SingleMethod
Saccharide Sequencing or Synthesis Systems
Carbohydrate IT





Carbohydrates, by their unique branching structure, contain an
evolutionary potential of information content several orders of
magnitude higher in a short sequence than in any other biological
oligomer.
This study addresses informational potential inherent in biological
recognition systems comprised of complex carbohydrate ligands
recognized for targeted activities by specifically binding cognate
protein receptors, such as lectins.
Evolution of receptor/ligand cognate pairs in carbohydrates is
complex and probably very slow.
Single point mutations in glycosyl transferase proteins are not likely
to alter sugar structures, except in cases where a minor amino acid
change could alter recognition among closely related sugars
comprising otherwise the same structure1 .
1Yamamoto, F.; Hakomori, S.-i. J Biol Chem 1990, 265:19257-19262.
Carbohydrate IT


The polypeptide-based carbohydrate recognition information
is carried in one or more genes.
Evolution of biological recognition of just one additional
sugar on an existing structure may require a combination of
the following:


1) mutation of the peptide sequence of an existing glycosyl
transferase (more likely), or evolution of a novel glycosyl transferase,
to make a novel carbohydrate structure, and
2) evolution of a new lectin binding site or new lectin to contain the
new binding/recognition site.
Carbohydrate IT


The complex carbohydrate cognate is coded into a
specifically ordered set of glycosyl transferase genes
where each N-1 precursor is part of the recognition
system in the binding site of the “next in-order”
glycosyl transferase for acceptance of the next sugar
in the sequence from a high energy donor.
Understanding the evolution, genetic control and
organization of these newly discovered
carbohydrate-protein recognition systems will be a
significant research challenge.
Carbohydrate IT






In all biological heteropolymers, the linear sequence of
monomers comprises, in some manner, a biological code
The ability of proteins to conform in a concave or convex
manner to recognize all other biological molecules includes
recognition of complex carbohydrates.
Proteins such as lectins, enzymes and antibodies can exhibit
exquisite binding specificities for the shape, charge, epimers,
anomers, linkage positions, ring size, branching and
monosaccharide sequence of carbohydrate ligand molecules
where the maximum recognized size is usually hexamer or
smaller2.
2Cisar,
J.; Kabat, E.A.; Dorner, M.M.; Liao, J. J Exp Med 1975 42: 435-459.
Takeo, K.; Kabat, E.A. J Immunol 1978, 121: 2305-2310.
Smith-Gill, S.J.; Rupley, J.A.; Pincus, M.R.; Carty, R.P.; Scheraga, H.A. Biochemistry
1984, 23: 993-997.
Carbohydrate IT

Carbohydrate sequences possess unique solution structures
which, although dynamic, are shown by Nuclear Overhauser
Effect NMR and molecular modeling to be populated mainly
by minimum energy 3-dimensional conformations3 .


Miller, K.E.; Mukhopadhyay, C.; Cagas, P.; Bush, C.A. Biochemistry
1992, 31:6703-6709.
Oligosaccharide haptens, being rather more rigid than short
peptides because of steric crowding4 , must be envisioned
in 3 dimensional space for specific recognition by proteins.
4Cumming,


D.A.; Carver, J.P. Biochemistry 1987, 26: 6664-6676.
French, AD; Mouhous-Riou, N.; Perez, S. Carbohydr Res 1993, 247:51-62.
Poppe, L.; Dabrowski, J.; von der Lieth, C.W.; Koike, K.; Ogawa, T. . Eur J
Biochem. 1990, 189, 313-325.
Carbohydrate IT

Carbohydrate polymers themselves often contain a
complex multifaceted sequence, and specific proteins
can bind to relatively short subsets or haptens within
longer saccharide sequences, such as in heparin5



Atha, D.H.; Lormeau, J.C.; Petitou, M.; Rosenberg, R.D.; Choay, J.
Biochemistry 1987 26: 6454-6461.
Riensenfeld, J.; Hook, M.; Bjork, I.; Lindahl, U.; Ajaxon, B. Fed Proc 1977,
36, 39-43.
vanBoeckel, C.A.A.; Petitou, M. Angewandte Chemie Int. Ed. 1993, 32,
1671-1818.
Carbohydrate IT

A lectin or other carbohydrate binding protein can act





in control mechanisms, such as selectins in inflammation
as signals for polypeptide location within the cell, such as
lysosomal protein markers6,
In single celled organisms as recognition markers for
predation or adhesion
In viruses to target cell-surface structures for adhesion and
invasion.
in the metazoan, for specific cell surface recognition of
one cell by another.
 6Reitman,
M.L.; Kornfeld, S. J Biol Chem 1981, 256:11977-11980.
Carbohydrate IT

A large collective of low avidity interactions may
take place to dramatically increase binding strength
where multimeric intercellular binding occurs7



A monomer may have a millimolar binding constant
Dimer -> micromolar
Trimer-> nanomolar (multivalency of adhesive sites)

7Lee,
Y.C. Ciba Found Symp 1990, 145:80-95.
 Higher

orders -> velcro effect
Specific spacing of carbohydrate moieties within a
structure may confer several orders of magnitude
tighter binding.
Carbohydrate IT

Possible higher complexity might occur where low avidity
binding of patterns of sets of carbohydrates by sets of binding
proteins may form recognition systems which may play a
powerful role in intercellular sociology

during development8



in the immune system9





Brandley, B.K.; Swiedler,S.J. ; Robbins, P.W. Cell 1990, 63, 861-870.
Aruffo, A.; Stamenkovic,I.; Melnick, M.; Underhill,C.B.; Seed, B. Cell 1990, 61, 1303-1310.
Polley, M.J.; Phillips,M.L.; Wayner,E.; Nudelman, E.; Singhal, A.K.; Hakomori, S.-i.; Paulson, J.C. Proc. Natl.
Sci USA 1991, 88: 6224-6229.
Yuen, C.T.; Lawson, A.M.; Chai, W.; Larkin, M.; Stoll, M.S.; Stuart, A.C.; Sullivan, F.X.; Ahern, T.J.; Feizi, T.
Biochemistry 1992, 31:9126-9131.
in parasitology10


Feizi, T. Nature 1985, 314, 53-57.
Feizi, T. Adv Exp Med Biol 1988, 228,317-329.
Friedman, M.J.; Fukuda, M.; Laine, R.A. Science 1985, 228: 75-77.
and other microbial pathogenesis11

Srnka, C.A.;Tiemeyer, M .; Gilbert, J.H.; Moreland, M.; Schweingruber,H.; de Lappe,B.W.;
James, P.G.; Gant, T .;Willoughby, R.E.; Yolken, R.H.; Nashed, M.A.; Abbas, S.A.; Laine, R.A.
Virology 1992, 190: 794-805. .
Acad.
Carbohydrate IT

Numerous reviews and recent papers have been written regarding new
discoveries in carbohydrate-based recognition systems such as the
















"Selectins"8,
glycosaminoglycan clotting factors12 ,
tumor markers13,
parasite recognition systems9,
rhizobium nodulation systems14,
plant pathogen recognition15 and
others16
8(op.cit)
9(op.cit)
12vanBoeckel, C.A.A.; Petitou, M. Angewandte Chemie Int. Ed. 1993, 32: 1671-1718.
13Hakomori, S. Am J Clin Pathol 1984, 82:635-648.
Hoff, S.D.; Irimura, T.; Matsushita, Y.; Ota, D.M.; Cleary, K.R.; Hakomori, S.-i. Arch Surg 1990, 125, 206209.
14Truchet, G.; Roche, P.; Lerouge, P.; Vasse,J.; Camut, S.; de Billy,F.; Prome,J.-C.; Denarie ,J . Nature
1991,
351: 670-673.
Fisher, R..F; Long, S.R. Nature 1992, 357: 655-660.
15Maniara, G.; Laine, R.A. ; Kuc, J. Physiol. Plant. Pathol. 1984,24: 177-186.
16Karlsson, K.A . Chem. Phys. Lipids 1986,42: ,153-175.
Carbohydrate IT

Banausic motives and research by new startup companies have recently driven science to
many new discoveries in the immune cell
recognition systems (Selectins and others)


Current molecular understanding of this system
alone augurs a giant breakthrough in
immunochemistry.
Taken together, these interesting findings
give bold introduction to a new excitement in
carbohydrate biochemistry.
Carbohydrate IT


A growing specialty area of biochemistry concerns itself
with the biology of protein recognition of specific
carbohydrates.
This field has been coined "Glycobiology" by Raymond
Dwek17 ,



Opdenakker, G.; Rudd, P.M.; Ponting, C.P.; Dwek, R.A. FASEB J 1993, 7:1330-7
Rademacher, T.W.; Parekh, R.B.; Dwek, R.A. Annu Rev Biochem 1988 57, 785-838.
The name Glycobiology has also been adopted by a Journal
and a North American Scientific Society of some 1000+
members (formerly the Society for Complex Carbohydrates).
Carbohydrate IT


What, therefore, are the structural components that
make carbohydrates so complex
and what is the magnitude of potential information
content for which it is apparent that higher organisms
have exploited.
Carbohydrate IT



Usually, with some exceptions, saccharide binding proteins
recognize a 6 sugar oligomer or smaller.
Within a hexasaccharide sequence comprised of a set of 6
different sugars (hexoses in this example) which may be
repeated or used more than once in a structure, more than 1.05
x 1012 possible carbohydrate structures exist.
In contrast a set of 6 different amino acids which can be
repeated in permuted structures can only generate 46656
different molecules, more than 7 orders of magnitude lower
than possible for carbohydrates.
Carbohydrate IT

Carbohydrates have 8 major structural features
comprising









1) epimers; including D and L forms;
2) linear sequence of core and of linear branches;
3) ring size, usually 5 or 6 membered;
4) anomeric configuration; a and ß
5) linkage position; (i.e., 1->2, 1->3, 1->4, 1->5, 1->6 etc)
6) branching positions and arborization;
7) reducing terminal attachment, (glycoside, acetal, ketal)
8) derivatives (ester, ether, phosphate, sulfate, lactyl, etc.)
all of the above contribute to large numbers of
equal-mass isomers in a short sequence potentially
each recognizable by specifically binding proteins.
Carbohydrate IT

Calculation of the isomers of an oligosaccharide was mentioned in
Nathan Sharon's collected lectures18 as originating with John Clamp (of
Britain) who estimated 1056 isomers for a trisaccharide comprised of 3
different hexoses.

18Sharon, N. , Complex Carbohydrates, Their Chemistry, Biosynthesis and Functions, Addison-Wesley
Publishing Company, Advanced Book Program, Reading, Mass., 1975; p . 7.

This calculation was based on






6 sequence permutations of 3 different monomers (3!),
8 permutations of alpha and beta anomeric configurations at each of three
sugars (23)
16 possibilities of attachment of the reducing terminal and internal sugar (to
the 2,3,4 or 6 hydroxyl of their respective aglycones) (42).
This number, 6 x 8 x 16 = 768,
It is not clear how the number 1056 was calculated by Clamp.
3 amino acids in a row or 3 nucleic acids would give 6 isomers as 3!
Carbohydrate IT
However, due to not considering repeating sugars, ring size or
branching, both Clamp and Sharon underestimated the
number of isomers by nearly 2 orders of magnitude.
Richard Schmidt in 1986 published a table showing a calculation of 720
isomers for a trisaccharide, 34,560 for a tetrasaccharide and 2,144,640 for
a pentasaccharide19
19Schmidt,
R. R. (1986) Angew. Chem. Int. Ed. Engl. 1986, 25, 212-235.
In 1988, Laine, et al. published a formula including a ring size
term, and estimated the resulting number for a linear, reducing
pentasaccharide with non-repeating units as follows20 :
n! x 2na x 2nr x 4n-1
Laine, R.A.; Pamidimukkala, K.M.; French, A.L.; Hall, R.W.; Abbas, S.A.; Jain, R.K. ; Matta, K.L.
J. Am. Chem. Soc. 1988, 110: 6931-6939.
Carbohydrate IT






n! x 2na x 2nr x 4n-1
where "n" is the number of monosaccharides connected to
each other in an “oligosaccharide”,
2n subscript "a" is the anomeric term,
2n subscript "r" the ring size term
linkage position is represented by 4n-1
Employed in a specific calculation for a linear
pentasaccharide comprised of 5 different nonrepeating hexoses this resulted in 31,457,280
isomers, all having the same mass.
Carbohydrate IT


However, the number of possible isomers is
actually much larger due to branching and the
natural possiblity of repeated monomers.
Carl G. Hellerqvist, in 1990, estimated 2.72
billion possible structures for a hexasaccharide
containing aminosugars, fucose and hexoses21.


Hellerqvist, C.G. Methods in Enzymology 1990,193, 554-573.
Hellerqvist’s theme was to show how these
numbers are lowered by successive analytical steps
Carbohydrate IT




Sugar monomers are often repeated in natural
carbohydrates just as in peptides.
Repeating saccharides, for example, were considered
in a separate calculation by Richard Schmidt (op.
cit.).
Therefore, in the Clamp/Sharon formula 3!x23x42 for
the number of possible trisaccharides from a set of 3
hexoses, the first term should have been 33 = 27
instead of 3! = 6.
The total should have been multiplied by another
term for ring size, since most hexoses can occur in
either pyranose (6-membered) or furanose (5membered) forms.
Carbohydrate IT





Considering the 5 membered ring would have increased the
result for a trisaccharide by a factor of 23 possibilities or x8.
The furanose form presents the possibility that in a
trisaccharide of sequence ABC, sugar A could have been
connected through the 5 position of sugar B, for example,
possibly increasing the number of potential linkage positions
to 5 instead of 4 in a hexose.
However this factor is taken into account by the ring size
term keeping the number of possibilities of linkage positions at
42=16.
Thus, the correct number for linear trisaccharides made up
from a set of 3 hexoses is 27 x 8 x 8 x 16 = 27,648.
Remember, a tripeptide, if aa’s repeated would be only 33 = 27
Branching oligosaccharides







Oligosaccharides can be made up of 2 sugars or more attached to the same moiety: Consider
Sugars A and B attached in different ways to sugar C, for example:
A(1->6)
B(1->6
\
\
C(1->R)*
or
C(1->R)
/
/
B(1->3)
A(1->3)
*R = reducing end attachment site (protein, lipid, other aglycon)
Carbohydrate IT



Another possibility for the configuration of a
trisaccharide is a branched structure where sugars A
and B are both glycosidically attached to sugar C by
a 2,3; 2,4; 2,6; 3,4; 3,6 or 4,6 branching pattern (six
possibilities).
Where sugar C is in the furanose form, however,
additional branching possibilities include 2,3; 2,5;
2,6; 3,5; 3,6; 5,6 for a total of 12 different branched
structures.
The ring size term 2nr, however, when applied to the
branching sugar, takes into account the additional 6
structures.
Branched Carbohydrates




Since each branch can occur in two different ways,
such as A6,B3 or B6,A3 there are again 12 different
ways to branch these three sugars.
The permutation term, En, however takes care of this
A6,B3 and B6,A3 branching duplex.
Possible branched trisaccharides from a set of 3
hexoses, each one unique and different from the
linear structures are 27 x 8 x 8 x 6 = 10,368.
En * 2nr * 2na * x 6n-2 (branched forms)
Carbohydrate IT



The total structures from a trisaccharide comprised of
3 hexoses, choosing among a set of only 3 different
hexoses is 27,648 (linear forms) plus 10,368
(branched forms) =
38,016,
This number is about 40 fold higher than Clamp's, Sharon's
or Schmidt's estimate of 720 - 1050. The formula for isomers
of a trisaccharide having a reducing end is thus:
En * 2nr * 2na * 4n-1 (linear forms)
+

En * 2nr * 2na * x 6n-2 (branched forms)
Analytical Challenge







Use of NMR as a single spectroscopic method:
Each trisaccharide would contain 15 ring protons including the
anomeric,
thus the proton NMR spectrum would need to resolve 38,016 x 15 =
570,240 "different" proton environments within 0.5 ppm. (the natural
dispersion for the ring protons, the anomeric protons are downfield)
It is doubtful that a tenth of this number of lines could be resolved
using multi-dimension proton NMR, (requiring a terahertz instrument).
Today, A Gigahertz NMR is about the limiting practical application.
In fact, the carbon-13 spectrum, thirty times more dispersed, would
need to resolve 38,016 x 18 carbons = 684,288 lines if they happened
all to be different, an impossibility.
NMR by itself, therefore cannot be used to absolutely identify
trisaccharides or higher oligomers by virtue of chemical shift values.
Analytical Challenge

As for mass spectrometry,






All 38,016 trisaccharide isomers have the same mass.
Partial fragmentation in collisional activated mass
spectrometry might provide the combination of partial
degradation and spectral patterns to resolve such
parameters as position of linkage22, or anomeric config.22
But this approach may not be sufficient without other
sensitive chemical manipulations.
Mendonca, S. Richard B. Cole, Junhua Zhu, Yang Cai, Alfred D. French, Glenn P. Johnson, and Roger A. Laine , 2003, Incremented Alkyl
Derivatives Enhance Collision Induced Glycosidic Bond Cleavage in Mass Spectrometry of Disaccharides J Am Soc Mass
Spectrom.14:63-78.
Yoon, E.; Laine, R.A. Biological Mass Spectrom. 1992,21, 479-485.
Laine, R.A. ; Yoon, E.; Mahier,T.J.; Abbas,S.A.; deLappe, B.W.; Jain, R.K.; Matta, K.L. Biological Mass Spectrometry 1991,20:
505-514.

Laine, RA (1990) Glycoconjugates: Overview and Strategy in Mass Spectrometry, Methods Enzymol. 193: 539-553. (ed: JA McCloskey)

Laine, R.A.. Methods in Enzymology. 1989, 179: 157-164.
Laine, R.A.; Pamidimukkala, K.M.; French, A.L.; Hall, R.W.; Abbas, S.A.; Jain, R.K. ; Matta, K.L. J. Am. Chem. Soc. 1988,110:
6931-6939.

Carbohydrate IT







NON-reducing oligosaccharides: Trisaccharides can also be configured
with the trehalose-type aldose-1->1-aldose or the sucrose/raffinose nonreducing aldose-1->2 ketose internal linkage structure,
Larger oligosaccharides can be linked in a head-to-tail cyclodextrin
fashion.
These kinds of permutations would add a large number to this calculation.
At first blush, for the set of "cyclodextric" hexasaccharides, the linear
permutations number calculated below would be multiplied by 4 due to the
linkage term added by the extra head-to-tail linkage, making the
clyclodextrics alone close to 0.8 trillion.
However, since there would be no reducing and non-reducing terminals,
many of the cyclic "isomers" might be identical depending on the chosen
starting position. This will require some additional noodling.
To simplify, the scope of this lecture will be limited to the much more
common reducing-end saccharides.
There have been no reported estimations of all isomers resulting from
oligosaccharide branching, therefore this is a new approach.
Carbohydrate IT


To simplify and address the issue of carbohydrate
isomers in a biologically relevant size more
thoroughly, we will estimate all of the possible
isomers for a reducing hexasaccharide comprised
from a set of 6 hexoses in the D -configuration.
Since both D - and L- configurations of hexoses
appear in nature, especially in plants, fungi and
microbes, the possible isomers are even higher than
we are considering here (by a factor of 26).
Carbohydrate IT


Although in this calculation we will only
consider the possible D - isomers, we must
consider that the pure L- forms generate an
equal number. and
The mixed D,L forms would add a multiple
of 64 to the total number.
Carbohydrate IT




LINEAR STRUCTURES:
The total number of possible structures, S*, of a D hexose-containing
hexasaccharide begins with the value for a linear chain of 6 different nonrepeating sugars ABCDEF, whose general formula is as follows:
A:
Where







S*=n!*2na*2nr*(4n-1)
n is the number of different hexoses in a string.
n! is the linear permutation term, no sugar monomers repeated (6! = 120).
2na is the term for anomeric isomers
(26) = 64.
2nr is the term for ring size (pyranose or furanose)
26=64
4n-1 is the linkage position term (45 = 1024)
Carbohydrate IT



While all 5 of the carbons 2 - 6 hydroxyls can participate in
the linkage position when considering pyranose and
furanose forms, pyranose excludes the 5 linkage and
furanose excludes the 4 linkage, therefore this part of the
linkage is taken into account by ring size, above.
This number for linear non-repeating structures of a
hexasaccharide considering only D stereochemistry would be:
A: S* = 6! * 26 * 26 * 45 = 3,019,898,880 (three billion!)
Carbohydrate IT











Table 1.
Linear Isomers of D -Hexoses, each hexose used once.
Oligosaccharide size:
Hexose Set
Linear Isomers
_____________________________________________________________
Monosaccharide
1
4
Disaccharide
2
128
Trisaccharide
3
6144
Tetrasaccharide
4
393,216
Pentasaccharide
5
31,457,280
Hexasaccharide
6
3,019,898,880
_____________________________________________________________
Carbohydrate IT





if each or any of the members of the 6 sugar set could be
repeated, equation A becomes A' as follows:
A':
S* = En * 2na * 2nr * (4n-1)
where n is the length of the chain in monomers, and E is
the number of different kinds of monomers (epimers) in the
set.
En is the linear permutation term where individual sugar
types can be repeated within the chain.
The remaining terms are the same as in equation A.
Carbohydrate IT

In this case, the number of permutations for a
linear hexasaccharide would be as follows:

A': S* = 66 * 26 * 26 * 45 = 46656 * 64 * 64 * 1024 = 195,689,447,424

Nearly 200 billion, an astonishing number!
Carbohydrate IT













Table 2:
Linear Isomers from a set of 1-6 D-Hexoses
_____________________________________________________________
Oligosaccharide size:
Hexose Set
Linear Isomers
_____________________________________________________________
Monosaccharide
1
4
Disaccharide
2
256
Trisaccharide
3
27,648
Tetrasaccharide
4
4,194,304
Pentasaccharide
5
819,200,000
Hexasaccharide
6
195,689,447,424
____________________________________________________________
Note that all of the mono- to pentasaccharides added together comprise
less than 0.5% of the number for the total hexasaccharide isomers.
OLIGOSAC CHARIDE ISOMERS
LINEAR OLIGOSACCHARIDE ISOMERS
IN D-HEXOSES
10
12
10
11
10
10
10
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
1
2
3
4
DEGREE OF LINEAR POLYM ERIZATION
IN D-HEXOSES
5
6
Analysis, synthesis



A technological barrier to simple one-method
analytical differentiation among this many structures
is even more apparent than with trisaccharides as
noted above.
Also, organic synthesis of one pure hexasaccharide
among 0.2 x 1012 possible structures is a daunting
task.
Indeed, synthesis of a trisaccharide is estimated by
most oligosaccharide synthesis chemists to take 20
man-weeks compared with 3 hours for a tripeptide.
There are few 95% yield reactions in oligosaccharide
synthesis. (some new automated machines alter this)
Carbohydrate IT

In addition, the above numbers would be increased by a large
number of biologically possible compounds with branched chains.

BRANCHED STRUCTURES:
The monosaccharide in position "F" is assigned to be the reducing-end throughout,
designated as "FR".
MONOSACCHARIDE BRANCHES:
For the singly branched compounds, examples are as follows:








B->C->D->E->FR
|
A
I
B->C->D->E->FR
|
A
II
B->C->D->E->FR
|
A
III
B->C->D->E->FR
|
A
IV
We will omit the arrows in the structures which are understood as pointing toward the
reducing end "FR".
Carbohydrate IT


Each of the above represented examples of
singly branched species can be considered as a
separate saccharide that has a fixed branch point
with regard to the location of the branching
sugar moiety within the chain, the branch being
movable among the hydroxyls on the branch
point sugar.
All of the monosaccharides in the hexamer are
then considered to contribute to isomers just as
the linear form, but with the branch positions
movable among carbons on each monomer
capable of forming branches within the chain.
Carbohydrate IT

The general formula for sets of oligosaccharide isomers
branched with a single monosaccharide along the core chain
would be:

B: S*= En * 2na * 2nr * (4n-3)*[6*(n-2)].

- where n-2 is the number of core monosaccharides that
can originate monosaccharide branches.
- 4n-3 are the permutations of positions of linkage on
unbranched monomers within the chain.
- 6*(n-2) are the possible arrangements of branches on
each of the hexopyranoses in the chain capable of
producing a branch (n-2 ).


Carbohydrate IT



These would be, for example, in I, above, the A,B branches
on C inserted as either A,B or B,A, respectively on the 2,3;
2,4; 2,6; 3,4; 3,6; or 4,6 positions of pyranoses and 2,3;
2,5; 2,6; 3,5; 3,6; and 5,6 positions of furanoses.
However, we assume that permutations of the ABC monomers
are included in the En term, therefore 12 possibilities remain
for each possible branch position.
However, the pyranose/furanose term, 2nr includes the
alternate set of 6 structures. Since the 6 possible positions for
branching in each ring form account for 12 possibilities by the
multiple of 2 in the ring form term, the factor for single
branches should be 6*(n-2).
Carbohydrate IT

In this case, the number of isomers for each of
configurations I-IV, above, would be:

B: S*B1 = 66 * 26 * 26 * (43) * [6*(n-2)] =
46656 x 64 x 64 x 64 x 24 = 293,534,171,136


This first branching example gives nearly 300
billion additional possible structures!
Carbohydrate IT






DISACCHARIDE BRANCHES:
For hexasaccharides with a single disaccharide branch,
the set would appear as follows:
C-D-E-FR
|
AB
V
C-D-E-FR
|
AB
VI
C-D-E-FR
|
AB
VII
etc.
Carbohydrate IT








B->C->D->E->FR
|
A
I
C-D-E-FR
|
AB
V
B->C->D->E->FR
|
A
II
C-D-E-FR
|
AB
VI
B->C->D->E->FR
|
A
B->C->D->E->FR
|
A
III
IV
C-D-E-FR
|
AB
VII
etc.
Carbohydrate IT

V is the same as II, where ABDEFR can be
considered the "core" structure with a single
monosaccharide branch on D, however, VI and VII
are novel arrangements.
Carbohydrate IT



The formula for this set would be
C: S*=En*2na*2nr*(4n-3)*[6*(n-4)].
where disaccharide branches that generate
new compounds beyond single branches
already considered can only happen on n-4 of
the monomers. Tetrasaccharides and below
would not produce novel compounds.
Carbohydrate IT



for hexasaccharides the numerical total is
46556*64*64*64*12= 146,452,512,768
(novel structures beyond linear and singlebranched hexasaccharides made up of 6
different hexoses.)
Carbohydrate IT

TRISACCHARIDE BRANCHES: to the core
chain,


D-E-FR
|
ABC

VIII

D-E-FR
|
ABC
IX
etc.
Carbohydrate IT

VIII is the same as III, sugar "D" being the single branch on the
core ABCEF, and IX is the same as VII, with a disaccharide
branch on the reducing end sugar "FR"; therefore new compounds
only occur in heptasaccharides and above.

and the formula is

D: S*=En*2na*2nr*(4n-3)*[6*(((n-6)+(Abs.(n-6))/2)].

For a hexasaccharide or smaller, no new compounds are
generated beyond those already considered, therefore
the result is 0.
Carbohydrate IT





TETRASACCHARIDE BRANCHES:
E-FR
|
ABCD
X
Carbohydrate IT

X is the same as IV with a monosaccharide branch on
"FR", and only produces new compounds with
octasaccharides and above:

This generates the formula:

E: S*=En*2na*2nr*(4n-3)*[6*(((n-7)+(Abs.(n-7))/2)].

For a heptasaccharide and smaller, this is numerically: 0
Carbohydrate IT

DI-BRANCHED COMPOUNDS:

Two single branches on two different core monosaccharides may
appear as follows:

C-D-E-FR
| |
A B




XI
C-D-E-FR
|
|
A B
C-D-E-FR
| |
A B
XII
All three are novel arrangements.
etc.
XIII
Carbohydrate IT





The factor of 6 different branch combinations now needs be applied to two
of the monosaccharides in the core while the anomerics and other
permutations remain the same.
F: S*=En*2na*2nr*4n-4*[62*(n-4+n-5+...+n-(n-1))].
Where 62 is the term considering 2 branches.
The term for permutations of locations of 2 branches along the
core is (n-4 + n-5 + ... + (n-(n-1)).
This formula is not valid for tetrasaccharides or below, where
n-(n-1)=3 and the series begins with n-4. Pentasaccharides
give n-4=1 from the n-(n-1) term. Hexasaccharides give n4=2 + n-5=1 for a value of 3. Likewise, heptasaccharides
would give a value of 6.
Carbohydrate IT

Numerically: for hexasaccharides with two monosaccharide branches
46656 x 64 x 64 x 16 * 36 * 3 = 330,225,942,528


The general formula representing the number of
permutations with B monosaccharide branches would be:

F':

S*=En*2na*2nr*4n-(B+2)*[6B*(term for permutation of branches)].
Carbohydrate IT




A heptasaccharide is the smallest
compound capable of triple single branches
as in :
D-E-F-G-(reducing end)
| | |
A BC
Carbohydrate IT

Two single branches on the same core
monosaccharide (Trisubstituted or triple-branched)
represent another novel set:


B
|
C-D-E-FR
|
A

XIV




B
|
C-D-E-FR
|
A
XV
B
|
C-D-E-FR
|
A
XVI
etc.
Carbohydrate IT

Branch possibilities for triple substituted monosaccharides including both
pyranose and furanose forms with exclusions: (2,3,4); (2,3,6); (2,4,6); (3,4,6) are
possible with pyranose branching structures; and (2,3,5); (2,3,6); (2,5,6);(3,5,6) (8
configurations of which we only need to consider 4 that are novel due to the ring
size factor 2nr. Each one of these can have 6 different permutative arrangements,
such as ABC, ACB, BAC, BCA, CAB, CBA, however, these are not additional
permutations beyond those covered by term (En). Each of 3 locations in the
trisaccharide can be tri-substituted in this way, therefore the term 4*(n-3) as follows:

G: S*=En*2na*2nr*(4n-4)*[4*(n-3)]

This formula does not function for trisaccharides or lower.

For a hexasaccharide:

46656 * 64 * 64 * 16 * 12 = 36,691,771,392
Carbohydrate IT




OTHER BRANCHING POSSIBILITIES:
Double disaccharide branches can occur on n-5
monosaccharide core members,
hexasaccharides are the smallest oligosaccharide for which this
set produces new compounds. F is trisubstituted on this example:
This could also be construed as a single monosaccharide branch
and a disaccharide branch on a trisaccharide core.

AB
|
E-FR
|
CD

XVII




Carbohydrate IT
H: S*=En*2na*2nr*(4n-4)*[4*(n-5)]
 Not valid for pentasaccharides or below.

for a hexasaccharide:
 46656 * 64 * 64 * 16 * 4 = 12,230,590,464

Carbohydrate IT

A combination of one monosaccharide and one disaccharide
branches on different core monosaccharides:

AB
|
D-E-FR
|
C
AB
|
D-E-FR
|
C

XVIII
XIX





etc.
XVIII is the same as XIII where the core is ABEF, while XIX
has the core DEF or ABF where F is the triply branched
reducing end and is therefore the same as XVII:
Carbohydrate IT

We need also consider a new class of branched compounds
where we have a single, itself branched trisaccharide branch,
as follows:


D-E-FR
|
C
/\
A B

XX




D-E-FR
|
C
/\
A B
XXI
Carbohydrate IT


Examination shows XX to have the core ACEF, or BCEF,
the same as XIII and XVIII, while XXI, being branched
on the reducing end by one disaccharide and a branched
trisaccharide is the same compound as XIX. As the
saccharide core length grows, new compounds can be
formed in this series.
This also introduces another form of two branched
structures in the same molecule: Thus for single
trisaccharide branched oligosaccharides: 2 branches allow
62 different substitution patterns each as in equation F,
above. Not valid for tetrasaccharides or smaller.
Carbohydrate IT




J: S*=En*2na*2nr*(4(n-5))*[6(1+(n-5))]
For a hexasaccharide:
46656 * 64 * 64 * 4 * 36 = 27,518,828,544
The first linkage component (4(n-5)) represents singly
substituted core saccharides; the second term 6(1+(n-5))
represents the branch permutations of the single
trisaccharide branch ("1+") and the disubstituted "core"
monosaccharide ("(n-5)"). Larger saccharides will present
much more complex branching permutations.
Carbohydrate IT
Still, one new compound can be envisioned which is
a variation on XXI which also has D and E
connected to the reducing-end F. As:
D
|
E-FR
|
C
/\
A B
Compound XXII
This compound also has a triple branch on F and opens the door to another form of
triple-branched versions of oligosaccharides where a monosaccharide and a branched
trisaccharide are both substituted onto a core monosaccharide.
Carbohydrate IT

Triple branched FR can have 4 allowed variations as in
equation G, above and monosaccharide C in this
illustration gives 6 variations as in equation B, above. No
singly substituted hexoses occur in saccharides of this
general structure smaller than 7-mers. Not valid for
pentasaccharides or lower.
K: S*=En*2na*2nr*(4(n-6))*[4*(n-5)]*6
 For a hexasaccharide:
 46656 * 64 * 64 * 1 * 4 * 6 = 4,586,471,424

Carbohydrate IT











Tetra-branched versions are also possible:
ED
\/
FR
XXIII
/\
A BC
AB
\/
E-FR
/\
CD
XXIV
Carbohydrate IT












Tetrabranched hexoses are completely substituted as 2,3,4,6
for pyranoses or 2,3,5,6 for furanoses.
Since no other branching is possible, the original term N!
for substitution permutations covers all of the possibilities
except that the disaccharide on structure XXIII could occupy
4 different sites on FR creating another factor of 4 in that
structure.
ED
\/
FR XXIII
/\
A BC
AB
\/
E-FR XXIV
/\
CD
Carbohydrate IT
L: S*=En*2na*2nr*(4(n-5)*[n-4])*(4(n-5))
 Not valid for tetrasaccharides or below.





The penultimate term [n-4] shows the number of core saccharides
capable of tetrasubstitution while the last term shows substitution of
the disaccharide AB on the hydroxyls of F in XXIII.
In heptasaccharides and above, one could also envision a
trisaccharide branch that could be inserted in the compound analogous
to XXIII while a disaccharide branch would find itself in the analog
to XXIV. Therefore, for higher oligosaccharides, extra terms need be
added to equation L.
For a hexasaccharide:
46656 * 64 * 64 * 4 * 2 * 4 = 6,115,295,232
Carbohydrate IT
 This
set of equations seems to cover
all possibilities for a
hexasaccharide or smaller where F
is the reducing end or is attached
to an aglycon.
Carbohydrate IT

Hepta-, Octa-, and Nona-saccharides offer possibilities of
higher orders of branching. Decasaccharides offer the first
possibility of quadruple branched saccharides,

A
\
C
/\
B D
/\
E F
/\
G H
/\
I J-R

(or three tri-branched residues):










Carbohydrate IT

(or three tri-branched residues):

ABC
| | |
G-H-I-J-(reducing end)
| | |
DEF




Carbohydrate IT


Higher orders of branching can be envisioned.
However, most biological activities are contained within a
reasonable sized proteinaceous binding site of 6 sugars, or
usually fewer as exemplified by





antibodies,
enzymes (lysozyme),
heparinoids or
lectins (selectins).
There are a few examples of proteins requiring higher
oligomers for activity, e.g. a few enzyme recognition sites in
the N-linked anabolic pathway for glycoprotein synthesis
which apparently recognize precursors as large as 14 sugars.
Carbohydrate IT



Taking all of the above calculations together,
the total number of permutations for a
hexasaccharide can be enumerated.
The master equation is given as the addition of
all equations A' through L:
negative values obtained from calculations
should be regarded as zero.
Carbohydrate IT













Totals taken from A' to L for hexasaccharides made up of D-hexoses:
A' 195,689,447,424
B 293,534,171,136
C 146,452,512,768
D 0
E 0
F
330,225,942,528
G
36,691,771,392
H
12,230,590,464
J
27,518,828,544
K
4,586,471,424
L
6,115,295,232
Total: 1,053,045,031,000
Carbohydrate IT
Without considering L sugars, or nonreducing forms the total number of
compounds from a hexasaccharide
comprised of 6 different hexoses will be
the total of the above, more than 1012
possible compounds.
 Including the mirror image L sugar forms
as stereochemical isomers within this set
would increase this number by a factor of
64, to more than 64 trillion.

Carbohydrate IT










Table 3: Isomers Including Branches and Repeating Hexoses:
____________________________________________________________
_
Oligosaccharide size: Linear and Branched Isomers
____________________________________________________________
_
Monosaccharide
Disaccharide
Trisaccharide
Tetrasaccharide
Pentasaccharide
Hexasaccharide
4
256
43,200
7,602,176
2,633,600,000
1,053,045,031,000
ISOMERS
Oligosaccharide Isomers from D-Hexoses
10
17
10
16
10
15
10
14
10
13
10
12
10
11
10
10
10
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
Octasaccharide Isomers Exceed 10e+17
Br anche d and Line ar
Oligos accharide s
Line ar
Oligos accharide s
Pe ptide s
1
2
3
4
5
Degree of Polymerization
6
7
8
Carbohydrate IT




Figure 1 shows the data from Tables 2 and 3
plotted along with data from the same length
peptides.
Extrapolation in Figure 1 shows that linear and
branched totals for heptasaccharides would
generate around 1015 compounds, and
octasaccharides would generate at least 1018
Divergence from the linear forms increases
from 1 log at heptamers to 2 logs at octamers.
The divergence is due to an increase in
branching types. A mole of isomers exists>8
Carbohydrate IT

While nature has not yet confounded us
with numbers of compounds of such
magnitude, this brings little comfort to
the analyst or synthetic chemist who
must, after all, come to the conclusion
that the oligosaccharide in question is,
absolutely, the single correct structure out
of billions.
Carbohydrate IT



For oligosaccharide building blocks, organisms
possess a larger number of possibilities than for
peptides.
There exist more than 50 types of sugars without
considering non-saccharidic substitutions.
Sugars are found substituted with acyl, alkyl,
pyruvyl, sulfate, sulfonate, phosphate,
phosphonate, and other groups, any one of which
would raise the possible isomers of only a singly
substituted saccharide to a number higher than
the one we have calculated.
Carbohydrate IT




That is, if one allows a single methyl group, for
example, to be substituted anywhere on a
hexasaccharide, 2.4 x 1013 new compounds could be
envisioned.
There are 24 hydroxyls free on each hexasaccharide.
Therefore a factor of 24 can be multiplied to the
already calculated total for hexasaccharides.
Each one of these would present a different antigen to
an antibody, for example. The human antibody
diversity potential is estimated above 1012
A biological “code”

There is a very high number of possible
epitopes for establishment of a biological
recognition "code" consisting of




the binding pocket of a specific protein on the one
hand coded as a lectin binding site
the complex sugar structure on the other.
The “code” is embedded as a set of sequentially acting
glycosyl transferases where the “program” may be
differently expressed in alternative tissues or in
certain conditions.
The above calculation shows the most complex
known biologically recognizable chemical “code”
in a short sequence yet uncovered in nature.
Carbohydrate IT


The set model for this project can be
described as a series of convex epitopes
which have a direction, that is they have one
or more beginning (non-reducing terminal)
termini and only one (usually aldehyde or
ketone) ending terminus,
The latter is conventionally called the
“reducing end” and written with this
"reducing" terminus monomer to the right.
The set





The epitopes can be populated with a set of epimeric
monomers of defined size,
These monomers can be linked to each other at one
position on the left hand and usually 4 different
positions on the right.
For each of the 4 positions there is a relation above
(ß) and below (a) the plane of the monomeric ring
(D -forms).
Each monomer can exist in a 5 or 6 sided polygon,
and
There is sequence order of all of these parameters.
Carbohydrate IT



This examination of the carbohydrate isomers is
exhaustive for hexasaccharides and lower, and
covers most isomers for compounds up to octasaccharides with the proviso that all possible
branched compounds are to be considered and their
terms are to be added.
The numbers are astronomical, showing a graph
that exceeds 2 logs per monomer through
pentasaccharides and grows to 3 logs per monomer
above heptasaccharides,
The values obtained are especially surprising for
such a short oligomer sequence.
Biological IT


Because proteins can evolve more rapidly than
carbohydrates (which must have a substantial
enzyme change to add a new sugar), saccharide
structures are likely to be very conserved over
evolution when compared with proteins whose
specificity could change with a single nucleotide
(thus amino acid) mutation.
An example exists in the literature, however, where
a few amino acid changes altered the specificity of
the transferase from galactose to Nacetylgalactosamine that confers A and B blood
types to humans. (work of Hakomori, et al.)
Carbohydrate IT


Those carbohydrate sequences in metazoans with
functions that are conserved will probably be
preserved across orders, such as the selectins and
heparinoids in mammals.
There is, obviously, requisite chemistry for much
further biological evolution in carbohydrate-protein
and, potentially, carbohydrate-carbohydrate and
carbohydrate-nucleotide recognition systems.
Download