A Calculation of all Possible Oligosaccharide Isomers Both Branched and Linear Yields 1.05 x 1012 Structures for a Reducing Hexasaccharide The Isomer Barrier to Development of SingleMethod Saccharide Sequencing or Synthesis Systems Carbohydrate IT Carbohydrates, by their unique branching structure, contain an evolutionary potential of information content several orders of magnitude higher in a short sequence than in any other biological oligomer. This study addresses informational potential inherent in biological recognition systems comprised of complex carbohydrate ligands recognized for targeted activities by specifically binding cognate protein receptors, such as lectins. Evolution of receptor/ligand cognate pairs in carbohydrates is complex and probably very slow. Single point mutations in glycosyl transferase proteins are not likely to alter sugar structures, except in cases where a minor amino acid change could alter recognition among closely related sugars comprising otherwise the same structure1 . 1Yamamoto, F.; Hakomori, S.-i. J Biol Chem 1990, 265:19257-19262. Carbohydrate IT The polypeptide-based carbohydrate recognition information is carried in one or more genes. Evolution of biological recognition of just one additional sugar on an existing structure may require a combination of the following: 1) mutation of the peptide sequence of an existing glycosyl transferase (more likely), or evolution of a novel glycosyl transferase, to make a novel carbohydrate structure, and 2) evolution of a new lectin binding site or new lectin to contain the new binding/recognition site. Carbohydrate IT The complex carbohydrate cognate is coded into a specifically ordered set of glycosyl transferase genes where each N-1 precursor is part of the recognition system in the binding site of the “next in-order” glycosyl transferase for acceptance of the next sugar in the sequence from a high energy donor. Understanding the evolution, genetic control and organization of these newly discovered carbohydrate-protein recognition systems will be a significant research challenge. Carbohydrate IT In all biological heteropolymers, the linear sequence of monomers comprises, in some manner, a biological code The ability of proteins to conform in a concave or convex manner to recognize all other biological molecules includes recognition of complex carbohydrates. Proteins such as lectins, enzymes and antibodies can exhibit exquisite binding specificities for the shape, charge, epimers, anomers, linkage positions, ring size, branching and monosaccharide sequence of carbohydrate ligand molecules where the maximum recognized size is usually hexamer or smaller2. 2Cisar, J.; Kabat, E.A.; Dorner, M.M.; Liao, J. J Exp Med 1975 42: 435-459. Takeo, K.; Kabat, E.A. J Immunol 1978, 121: 2305-2310. Smith-Gill, S.J.; Rupley, J.A.; Pincus, M.R.; Carty, R.P.; Scheraga, H.A. Biochemistry 1984, 23: 993-997. Carbohydrate IT Carbohydrate sequences possess unique solution structures which, although dynamic, are shown by Nuclear Overhauser Effect NMR and molecular modeling to be populated mainly by minimum energy 3-dimensional conformations3 . Miller, K.E.; Mukhopadhyay, C.; Cagas, P.; Bush, C.A. Biochemistry 1992, 31:6703-6709. Oligosaccharide haptens, being rather more rigid than short peptides because of steric crowding4 , must be envisioned in 3 dimensional space for specific recognition by proteins. 4Cumming, D.A.; Carver, J.P. Biochemistry 1987, 26: 6664-6676. French, AD; Mouhous-Riou, N.; Perez, S. Carbohydr Res 1993, 247:51-62. Poppe, L.; Dabrowski, J.; von der Lieth, C.W.; Koike, K.; Ogawa, T. . Eur J Biochem. 1990, 189, 313-325. Carbohydrate IT Carbohydrate polymers themselves often contain a complex multifaceted sequence, and specific proteins can bind to relatively short subsets or haptens within longer saccharide sequences, such as in heparin5 Atha, D.H.; Lormeau, J.C.; Petitou, M.; Rosenberg, R.D.; Choay, J. Biochemistry 1987 26: 6454-6461. Riensenfeld, J.; Hook, M.; Bjork, I.; Lindahl, U.; Ajaxon, B. Fed Proc 1977, 36, 39-43. vanBoeckel, C.A.A.; Petitou, M. Angewandte Chemie Int. Ed. 1993, 32, 1671-1818. Carbohydrate IT A lectin or other carbohydrate binding protein can act in control mechanisms, such as selectins in inflammation as signals for polypeptide location within the cell, such as lysosomal protein markers6, In single celled organisms as recognition markers for predation or adhesion In viruses to target cell-surface structures for adhesion and invasion. in the metazoan, for specific cell surface recognition of one cell by another. 6Reitman, M.L.; Kornfeld, S. J Biol Chem 1981, 256:11977-11980. Carbohydrate IT A large collective of low avidity interactions may take place to dramatically increase binding strength where multimeric intercellular binding occurs7 A monomer may have a millimolar binding constant Dimer -> micromolar Trimer-> nanomolar (multivalency of adhesive sites) 7Lee, Y.C. Ciba Found Symp 1990, 145:80-95. Higher orders -> velcro effect Specific spacing of carbohydrate moieties within a structure may confer several orders of magnitude tighter binding. Carbohydrate IT Possible higher complexity might occur where low avidity binding of patterns of sets of carbohydrates by sets of binding proteins may form recognition systems which may play a powerful role in intercellular sociology during development8 in the immune system9 Brandley, B.K.; Swiedler,S.J. ; Robbins, P.W. Cell 1990, 63, 861-870. Aruffo, A.; Stamenkovic,I.; Melnick, M.; Underhill,C.B.; Seed, B. Cell 1990, 61, 1303-1310. Polley, M.J.; Phillips,M.L.; Wayner,E.; Nudelman, E.; Singhal, A.K.; Hakomori, S.-i.; Paulson, J.C. Proc. Natl. Sci USA 1991, 88: 6224-6229. Yuen, C.T.; Lawson, A.M.; Chai, W.; Larkin, M.; Stoll, M.S.; Stuart, A.C.; Sullivan, F.X.; Ahern, T.J.; Feizi, T. Biochemistry 1992, 31:9126-9131. in parasitology10 Feizi, T. Nature 1985, 314, 53-57. Feizi, T. Adv Exp Med Biol 1988, 228,317-329. Friedman, M.J.; Fukuda, M.; Laine, R.A. Science 1985, 228: 75-77. and other microbial pathogenesis11 Srnka, C.A.;Tiemeyer, M .; Gilbert, J.H.; Moreland, M.; Schweingruber,H.; de Lappe,B.W.; James, P.G.; Gant, T .;Willoughby, R.E.; Yolken, R.H.; Nashed, M.A.; Abbas, S.A.; Laine, R.A. Virology 1992, 190: 794-805. . Acad. Carbohydrate IT Numerous reviews and recent papers have been written regarding new discoveries in carbohydrate-based recognition systems such as the "Selectins"8, glycosaminoglycan clotting factors12 , tumor markers13, parasite recognition systems9, rhizobium nodulation systems14, plant pathogen recognition15 and others16 8(op.cit) 9(op.cit) 12vanBoeckel, C.A.A.; Petitou, M. Angewandte Chemie Int. Ed. 1993, 32: 1671-1718. 13Hakomori, S. Am J Clin Pathol 1984, 82:635-648. Hoff, S.D.; Irimura, T.; Matsushita, Y.; Ota, D.M.; Cleary, K.R.; Hakomori, S.-i. Arch Surg 1990, 125, 206209. 14Truchet, G.; Roche, P.; Lerouge, P.; Vasse,J.; Camut, S.; de Billy,F.; Prome,J.-C.; Denarie ,J . Nature 1991, 351: 670-673. Fisher, R..F; Long, S.R. Nature 1992, 357: 655-660. 15Maniara, G.; Laine, R.A. ; Kuc, J. Physiol. Plant. Pathol. 1984,24: 177-186. 16Karlsson, K.A . Chem. Phys. Lipids 1986,42: ,153-175. Carbohydrate IT Banausic motives and research by new startup companies have recently driven science to many new discoveries in the immune cell recognition systems (Selectins and others) Current molecular understanding of this system alone augurs a giant breakthrough in immunochemistry. Taken together, these interesting findings give bold introduction to a new excitement in carbohydrate biochemistry. Carbohydrate IT A growing specialty area of biochemistry concerns itself with the biology of protein recognition of specific carbohydrates. This field has been coined "Glycobiology" by Raymond Dwek17 , Opdenakker, G.; Rudd, P.M.; Ponting, C.P.; Dwek, R.A. FASEB J 1993, 7:1330-7 Rademacher, T.W.; Parekh, R.B.; Dwek, R.A. Annu Rev Biochem 1988 57, 785-838. The name Glycobiology has also been adopted by a Journal and a North American Scientific Society of some 1000+ members (formerly the Society for Complex Carbohydrates). Carbohydrate IT What, therefore, are the structural components that make carbohydrates so complex and what is the magnitude of potential information content for which it is apparent that higher organisms have exploited. Carbohydrate IT Usually, with some exceptions, saccharide binding proteins recognize a 6 sugar oligomer or smaller. Within a hexasaccharide sequence comprised of a set of 6 different sugars (hexoses in this example) which may be repeated or used more than once in a structure, more than 1.05 x 1012 possible carbohydrate structures exist. In contrast a set of 6 different amino acids which can be repeated in permuted structures can only generate 46656 different molecules, more than 7 orders of magnitude lower than possible for carbohydrates. Carbohydrate IT Carbohydrates have 8 major structural features comprising 1) epimers; including D and L forms; 2) linear sequence of core and of linear branches; 3) ring size, usually 5 or 6 membered; 4) anomeric configuration; a and ß 5) linkage position; (i.e., 1->2, 1->3, 1->4, 1->5, 1->6 etc) 6) branching positions and arborization; 7) reducing terminal attachment, (glycoside, acetal, ketal) 8) derivatives (ester, ether, phosphate, sulfate, lactyl, etc.) all of the above contribute to large numbers of equal-mass isomers in a short sequence potentially each recognizable by specifically binding proteins. Carbohydrate IT Calculation of the isomers of an oligosaccharide was mentioned in Nathan Sharon's collected lectures18 as originating with John Clamp (of Britain) who estimated 1056 isomers for a trisaccharide comprised of 3 different hexoses. 18Sharon, N. , Complex Carbohydrates, Their Chemistry, Biosynthesis and Functions, Addison-Wesley Publishing Company, Advanced Book Program, Reading, Mass., 1975; p . 7. This calculation was based on 6 sequence permutations of 3 different monomers (3!), 8 permutations of alpha and beta anomeric configurations at each of three sugars (23) 16 possibilities of attachment of the reducing terminal and internal sugar (to the 2,3,4 or 6 hydroxyl of their respective aglycones) (42). This number, 6 x 8 x 16 = 768, It is not clear how the number 1056 was calculated by Clamp. 3 amino acids in a row or 3 nucleic acids would give 6 isomers as 3! Carbohydrate IT However, due to not considering repeating sugars, ring size or branching, both Clamp and Sharon underestimated the number of isomers by nearly 2 orders of magnitude. Richard Schmidt in 1986 published a table showing a calculation of 720 isomers for a trisaccharide, 34,560 for a tetrasaccharide and 2,144,640 for a pentasaccharide19 19Schmidt, R. R. (1986) Angew. Chem. Int. Ed. Engl. 1986, 25, 212-235. In 1988, Laine, et al. published a formula including a ring size term, and estimated the resulting number for a linear, reducing pentasaccharide with non-repeating units as follows20 : n! x 2na x 2nr x 4n-1 Laine, R.A.; Pamidimukkala, K.M.; French, A.L.; Hall, R.W.; Abbas, S.A.; Jain, R.K. ; Matta, K.L. J. Am. Chem. Soc. 1988, 110: 6931-6939. Carbohydrate IT n! x 2na x 2nr x 4n-1 where "n" is the number of monosaccharides connected to each other in an “oligosaccharide”, 2n subscript "a" is the anomeric term, 2n subscript "r" the ring size term linkage position is represented by 4n-1 Employed in a specific calculation for a linear pentasaccharide comprised of 5 different nonrepeating hexoses this resulted in 31,457,280 isomers, all having the same mass. Carbohydrate IT However, the number of possible isomers is actually much larger due to branching and the natural possiblity of repeated monomers. Carl G. Hellerqvist, in 1990, estimated 2.72 billion possible structures for a hexasaccharide containing aminosugars, fucose and hexoses21. Hellerqvist, C.G. Methods in Enzymology 1990,193, 554-573. Hellerqvist’s theme was to show how these numbers are lowered by successive analytical steps Carbohydrate IT Sugar monomers are often repeated in natural carbohydrates just as in peptides. Repeating saccharides, for example, were considered in a separate calculation by Richard Schmidt (op. cit.). Therefore, in the Clamp/Sharon formula 3!x23x42 for the number of possible trisaccharides from a set of 3 hexoses, the first term should have been 33 = 27 instead of 3! = 6. The total should have been multiplied by another term for ring size, since most hexoses can occur in either pyranose (6-membered) or furanose (5membered) forms. Carbohydrate IT Considering the 5 membered ring would have increased the result for a trisaccharide by a factor of 23 possibilities or x8. The furanose form presents the possibility that in a trisaccharide of sequence ABC, sugar A could have been connected through the 5 position of sugar B, for example, possibly increasing the number of potential linkage positions to 5 instead of 4 in a hexose. However this factor is taken into account by the ring size term keeping the number of possibilities of linkage positions at 42=16. Thus, the correct number for linear trisaccharides made up from a set of 3 hexoses is 27 x 8 x 8 x 16 = 27,648. Remember, a tripeptide, if aa’s repeated would be only 33 = 27 Branching oligosaccharides Oligosaccharides can be made up of 2 sugars or more attached to the same moiety: Consider Sugars A and B attached in different ways to sugar C, for example: A(1->6) B(1->6 \ \ C(1->R)* or C(1->R) / / B(1->3) A(1->3) *R = reducing end attachment site (protein, lipid, other aglycon) Carbohydrate IT Another possibility for the configuration of a trisaccharide is a branched structure where sugars A and B are both glycosidically attached to sugar C by a 2,3; 2,4; 2,6; 3,4; 3,6 or 4,6 branching pattern (six possibilities). Where sugar C is in the furanose form, however, additional branching possibilities include 2,3; 2,5; 2,6; 3,5; 3,6; 5,6 for a total of 12 different branched structures. The ring size term 2nr, however, when applied to the branching sugar, takes into account the additional 6 structures. Branched Carbohydrates Since each branch can occur in two different ways, such as A6,B3 or B6,A3 there are again 12 different ways to branch these three sugars. The permutation term, En, however takes care of this A6,B3 and B6,A3 branching duplex. Possible branched trisaccharides from a set of 3 hexoses, each one unique and different from the linear structures are 27 x 8 x 8 x 6 = 10,368. En * 2nr * 2na * x 6n-2 (branched forms) Carbohydrate IT The total structures from a trisaccharide comprised of 3 hexoses, choosing among a set of only 3 different hexoses is 27,648 (linear forms) plus 10,368 (branched forms) = 38,016, This number is about 40 fold higher than Clamp's, Sharon's or Schmidt's estimate of 720 - 1050. The formula for isomers of a trisaccharide having a reducing end is thus: En * 2nr * 2na * 4n-1 (linear forms) + En * 2nr * 2na * x 6n-2 (branched forms) Analytical Challenge Use of NMR as a single spectroscopic method: Each trisaccharide would contain 15 ring protons including the anomeric, thus the proton NMR spectrum would need to resolve 38,016 x 15 = 570,240 "different" proton environments within 0.5 ppm. (the natural dispersion for the ring protons, the anomeric protons are downfield) It is doubtful that a tenth of this number of lines could be resolved using multi-dimension proton NMR, (requiring a terahertz instrument). Today, A Gigahertz NMR is about the limiting practical application. In fact, the carbon-13 spectrum, thirty times more dispersed, would need to resolve 38,016 x 18 carbons = 684,288 lines if they happened all to be different, an impossibility. NMR by itself, therefore cannot be used to absolutely identify trisaccharides or higher oligomers by virtue of chemical shift values. Analytical Challenge As for mass spectrometry, All 38,016 trisaccharide isomers have the same mass. Partial fragmentation in collisional activated mass spectrometry might provide the combination of partial degradation and spectral patterns to resolve such parameters as position of linkage22, or anomeric config.22 But this approach may not be sufficient without other sensitive chemical manipulations. Mendonca, S. Richard B. Cole, Junhua Zhu, Yang Cai, Alfred D. French, Glenn P. Johnson, and Roger A. Laine , 2003, Incremented Alkyl Derivatives Enhance Collision Induced Glycosidic Bond Cleavage in Mass Spectrometry of Disaccharides J Am Soc Mass Spectrom.14:63-78. Yoon, E.; Laine, R.A. Biological Mass Spectrom. 1992,21, 479-485. Laine, R.A. ; Yoon, E.; Mahier,T.J.; Abbas,S.A.; deLappe, B.W.; Jain, R.K.; Matta, K.L. Biological Mass Spectrometry 1991,20: 505-514. Laine, RA (1990) Glycoconjugates: Overview and Strategy in Mass Spectrometry, Methods Enzymol. 193: 539-553. (ed: JA McCloskey) Laine, R.A.. Methods in Enzymology. 1989, 179: 157-164. Laine, R.A.; Pamidimukkala, K.M.; French, A.L.; Hall, R.W.; Abbas, S.A.; Jain, R.K. ; Matta, K.L. J. Am. Chem. Soc. 1988,110: 6931-6939. Carbohydrate IT NON-reducing oligosaccharides: Trisaccharides can also be configured with the trehalose-type aldose-1->1-aldose or the sucrose/raffinose nonreducing aldose-1->2 ketose internal linkage structure, Larger oligosaccharides can be linked in a head-to-tail cyclodextrin fashion. These kinds of permutations would add a large number to this calculation. At first blush, for the set of "cyclodextric" hexasaccharides, the linear permutations number calculated below would be multiplied by 4 due to the linkage term added by the extra head-to-tail linkage, making the clyclodextrics alone close to 0.8 trillion. However, since there would be no reducing and non-reducing terminals, many of the cyclic "isomers" might be identical depending on the chosen starting position. This will require some additional noodling. To simplify, the scope of this lecture will be limited to the much more common reducing-end saccharides. There have been no reported estimations of all isomers resulting from oligosaccharide branching, therefore this is a new approach. Carbohydrate IT To simplify and address the issue of carbohydrate isomers in a biologically relevant size more thoroughly, we will estimate all of the possible isomers for a reducing hexasaccharide comprised from a set of 6 hexoses in the D -configuration. Since both D - and L- configurations of hexoses appear in nature, especially in plants, fungi and microbes, the possible isomers are even higher than we are considering here (by a factor of 26). Carbohydrate IT Although in this calculation we will only consider the possible D - isomers, we must consider that the pure L- forms generate an equal number. and The mixed D,L forms would add a multiple of 64 to the total number. Carbohydrate IT LINEAR STRUCTURES: The total number of possible structures, S*, of a D hexose-containing hexasaccharide begins with the value for a linear chain of 6 different nonrepeating sugars ABCDEF, whose general formula is as follows: A: Where S*=n!*2na*2nr*(4n-1) n is the number of different hexoses in a string. n! is the linear permutation term, no sugar monomers repeated (6! = 120). 2na is the term for anomeric isomers (26) = 64. 2nr is the term for ring size (pyranose or furanose) 26=64 4n-1 is the linkage position term (45 = 1024) Carbohydrate IT While all 5 of the carbons 2 - 6 hydroxyls can participate in the linkage position when considering pyranose and furanose forms, pyranose excludes the 5 linkage and furanose excludes the 4 linkage, therefore this part of the linkage is taken into account by ring size, above. This number for linear non-repeating structures of a hexasaccharide considering only D stereochemistry would be: A: S* = 6! * 26 * 26 * 45 = 3,019,898,880 (three billion!) Carbohydrate IT Table 1. Linear Isomers of D -Hexoses, each hexose used once. Oligosaccharide size: Hexose Set Linear Isomers _____________________________________________________________ Monosaccharide 1 4 Disaccharide 2 128 Trisaccharide 3 6144 Tetrasaccharide 4 393,216 Pentasaccharide 5 31,457,280 Hexasaccharide 6 3,019,898,880 _____________________________________________________________ Carbohydrate IT if each or any of the members of the 6 sugar set could be repeated, equation A becomes A' as follows: A': S* = En * 2na * 2nr * (4n-1) where n is the length of the chain in monomers, and E is the number of different kinds of monomers (epimers) in the set. En is the linear permutation term where individual sugar types can be repeated within the chain. The remaining terms are the same as in equation A. Carbohydrate IT In this case, the number of permutations for a linear hexasaccharide would be as follows: A': S* = 66 * 26 * 26 * 45 = 46656 * 64 * 64 * 1024 = 195,689,447,424 Nearly 200 billion, an astonishing number! Carbohydrate IT Table 2: Linear Isomers from a set of 1-6 D-Hexoses _____________________________________________________________ Oligosaccharide size: Hexose Set Linear Isomers _____________________________________________________________ Monosaccharide 1 4 Disaccharide 2 256 Trisaccharide 3 27,648 Tetrasaccharide 4 4,194,304 Pentasaccharide 5 819,200,000 Hexasaccharide 6 195,689,447,424 ____________________________________________________________ Note that all of the mono- to pentasaccharides added together comprise less than 0.5% of the number for the total hexasaccharide isomers. OLIGOSAC CHARIDE ISOMERS LINEAR OLIGOSACCHARIDE ISOMERS IN D-HEXOSES 10 12 10 11 10 10 10 9 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 1 10 0 1 2 3 4 DEGREE OF LINEAR POLYM ERIZATION IN D-HEXOSES 5 6 Analysis, synthesis A technological barrier to simple one-method analytical differentiation among this many structures is even more apparent than with trisaccharides as noted above. Also, organic synthesis of one pure hexasaccharide among 0.2 x 1012 possible structures is a daunting task. Indeed, synthesis of a trisaccharide is estimated by most oligosaccharide synthesis chemists to take 20 man-weeks compared with 3 hours for a tripeptide. There are few 95% yield reactions in oligosaccharide synthesis. (some new automated machines alter this) Carbohydrate IT In addition, the above numbers would be increased by a large number of biologically possible compounds with branched chains. BRANCHED STRUCTURES: The monosaccharide in position "F" is assigned to be the reducing-end throughout, designated as "FR". MONOSACCHARIDE BRANCHES: For the singly branched compounds, examples are as follows: B->C->D->E->FR | A I B->C->D->E->FR | A II B->C->D->E->FR | A III B->C->D->E->FR | A IV We will omit the arrows in the structures which are understood as pointing toward the reducing end "FR". Carbohydrate IT Each of the above represented examples of singly branched species can be considered as a separate saccharide that has a fixed branch point with regard to the location of the branching sugar moiety within the chain, the branch being movable among the hydroxyls on the branch point sugar. All of the monosaccharides in the hexamer are then considered to contribute to isomers just as the linear form, but with the branch positions movable among carbons on each monomer capable of forming branches within the chain. Carbohydrate IT The general formula for sets of oligosaccharide isomers branched with a single monosaccharide along the core chain would be: B: S*= En * 2na * 2nr * (4n-3)*[6*(n-2)]. - where n-2 is the number of core monosaccharides that can originate monosaccharide branches. - 4n-3 are the permutations of positions of linkage on unbranched monomers within the chain. - 6*(n-2) are the possible arrangements of branches on each of the hexopyranoses in the chain capable of producing a branch (n-2 ). Carbohydrate IT These would be, for example, in I, above, the A,B branches on C inserted as either A,B or B,A, respectively on the 2,3; 2,4; 2,6; 3,4; 3,6; or 4,6 positions of pyranoses and 2,3; 2,5; 2,6; 3,5; 3,6; and 5,6 positions of furanoses. However, we assume that permutations of the ABC monomers are included in the En term, therefore 12 possibilities remain for each possible branch position. However, the pyranose/furanose term, 2nr includes the alternate set of 6 structures. Since the 6 possible positions for branching in each ring form account for 12 possibilities by the multiple of 2 in the ring form term, the factor for single branches should be 6*(n-2). Carbohydrate IT In this case, the number of isomers for each of configurations I-IV, above, would be: B: S*B1 = 66 * 26 * 26 * (43) * [6*(n-2)] = 46656 x 64 x 64 x 64 x 24 = 293,534,171,136 This first branching example gives nearly 300 billion additional possible structures! Carbohydrate IT DISACCHARIDE BRANCHES: For hexasaccharides with a single disaccharide branch, the set would appear as follows: C-D-E-FR | AB V C-D-E-FR | AB VI C-D-E-FR | AB VII etc. Carbohydrate IT B->C->D->E->FR | A I C-D-E-FR | AB V B->C->D->E->FR | A II C-D-E-FR | AB VI B->C->D->E->FR | A B->C->D->E->FR | A III IV C-D-E-FR | AB VII etc. Carbohydrate IT V is the same as II, where ABDEFR can be considered the "core" structure with a single monosaccharide branch on D, however, VI and VII are novel arrangements. Carbohydrate IT The formula for this set would be C: S*=En*2na*2nr*(4n-3)*[6*(n-4)]. where disaccharide branches that generate new compounds beyond single branches already considered can only happen on n-4 of the monomers. Tetrasaccharides and below would not produce novel compounds. Carbohydrate IT for hexasaccharides the numerical total is 46556*64*64*64*12= 146,452,512,768 (novel structures beyond linear and singlebranched hexasaccharides made up of 6 different hexoses.) Carbohydrate IT TRISACCHARIDE BRANCHES: to the core chain, D-E-FR | ABC VIII D-E-FR | ABC IX etc. Carbohydrate IT VIII is the same as III, sugar "D" being the single branch on the core ABCEF, and IX is the same as VII, with a disaccharide branch on the reducing end sugar "FR"; therefore new compounds only occur in heptasaccharides and above. and the formula is D: S*=En*2na*2nr*(4n-3)*[6*(((n-6)+(Abs.(n-6))/2)]. For a hexasaccharide or smaller, no new compounds are generated beyond those already considered, therefore the result is 0. Carbohydrate IT TETRASACCHARIDE BRANCHES: E-FR | ABCD X Carbohydrate IT X is the same as IV with a monosaccharide branch on "FR", and only produces new compounds with octasaccharides and above: This generates the formula: E: S*=En*2na*2nr*(4n-3)*[6*(((n-7)+(Abs.(n-7))/2)]. For a heptasaccharide and smaller, this is numerically: 0 Carbohydrate IT DI-BRANCHED COMPOUNDS: Two single branches on two different core monosaccharides may appear as follows: C-D-E-FR | | A B XI C-D-E-FR | | A B C-D-E-FR | | A B XII All three are novel arrangements. etc. XIII Carbohydrate IT The factor of 6 different branch combinations now needs be applied to two of the monosaccharides in the core while the anomerics and other permutations remain the same. F: S*=En*2na*2nr*4n-4*[62*(n-4+n-5+...+n-(n-1))]. Where 62 is the term considering 2 branches. The term for permutations of locations of 2 branches along the core is (n-4 + n-5 + ... + (n-(n-1)). This formula is not valid for tetrasaccharides or below, where n-(n-1)=3 and the series begins with n-4. Pentasaccharides give n-4=1 from the n-(n-1) term. Hexasaccharides give n4=2 + n-5=1 for a value of 3. Likewise, heptasaccharides would give a value of 6. Carbohydrate IT Numerically: for hexasaccharides with two monosaccharide branches 46656 x 64 x 64 x 16 * 36 * 3 = 330,225,942,528 The general formula representing the number of permutations with B monosaccharide branches would be: F': S*=En*2na*2nr*4n-(B+2)*[6B*(term for permutation of branches)]. Carbohydrate IT A heptasaccharide is the smallest compound capable of triple single branches as in : D-E-F-G-(reducing end) | | | A BC Carbohydrate IT Two single branches on the same core monosaccharide (Trisubstituted or triple-branched) represent another novel set: B | C-D-E-FR | A XIV B | C-D-E-FR | A XV B | C-D-E-FR | A XVI etc. Carbohydrate IT Branch possibilities for triple substituted monosaccharides including both pyranose and furanose forms with exclusions: (2,3,4); (2,3,6); (2,4,6); (3,4,6) are possible with pyranose branching structures; and (2,3,5); (2,3,6); (2,5,6);(3,5,6) (8 configurations of which we only need to consider 4 that are novel due to the ring size factor 2nr. Each one of these can have 6 different permutative arrangements, such as ABC, ACB, BAC, BCA, CAB, CBA, however, these are not additional permutations beyond those covered by term (En). Each of 3 locations in the trisaccharide can be tri-substituted in this way, therefore the term 4*(n-3) as follows: G: S*=En*2na*2nr*(4n-4)*[4*(n-3)] This formula does not function for trisaccharides or lower. For a hexasaccharide: 46656 * 64 * 64 * 16 * 12 = 36,691,771,392 Carbohydrate IT OTHER BRANCHING POSSIBILITIES: Double disaccharide branches can occur on n-5 monosaccharide core members, hexasaccharides are the smallest oligosaccharide for which this set produces new compounds. F is trisubstituted on this example: This could also be construed as a single monosaccharide branch and a disaccharide branch on a trisaccharide core. AB | E-FR | CD XVII Carbohydrate IT H: S*=En*2na*2nr*(4n-4)*[4*(n-5)] Not valid for pentasaccharides or below. for a hexasaccharide: 46656 * 64 * 64 * 16 * 4 = 12,230,590,464 Carbohydrate IT A combination of one monosaccharide and one disaccharide branches on different core monosaccharides: AB | D-E-FR | C AB | D-E-FR | C XVIII XIX etc. XVIII is the same as XIII where the core is ABEF, while XIX has the core DEF or ABF where F is the triply branched reducing end and is therefore the same as XVII: Carbohydrate IT We need also consider a new class of branched compounds where we have a single, itself branched trisaccharide branch, as follows: D-E-FR | C /\ A B XX D-E-FR | C /\ A B XXI Carbohydrate IT Examination shows XX to have the core ACEF, or BCEF, the same as XIII and XVIII, while XXI, being branched on the reducing end by one disaccharide and a branched trisaccharide is the same compound as XIX. As the saccharide core length grows, new compounds can be formed in this series. This also introduces another form of two branched structures in the same molecule: Thus for single trisaccharide branched oligosaccharides: 2 branches allow 62 different substitution patterns each as in equation F, above. Not valid for tetrasaccharides or smaller. Carbohydrate IT J: S*=En*2na*2nr*(4(n-5))*[6(1+(n-5))] For a hexasaccharide: 46656 * 64 * 64 * 4 * 36 = 27,518,828,544 The first linkage component (4(n-5)) represents singly substituted core saccharides; the second term 6(1+(n-5)) represents the branch permutations of the single trisaccharide branch ("1+") and the disubstituted "core" monosaccharide ("(n-5)"). Larger saccharides will present much more complex branching permutations. Carbohydrate IT Still, one new compound can be envisioned which is a variation on XXI which also has D and E connected to the reducing-end F. As: D | E-FR | C /\ A B Compound XXII This compound also has a triple branch on F and opens the door to another form of triple-branched versions of oligosaccharides where a monosaccharide and a branched trisaccharide are both substituted onto a core monosaccharide. Carbohydrate IT Triple branched FR can have 4 allowed variations as in equation G, above and monosaccharide C in this illustration gives 6 variations as in equation B, above. No singly substituted hexoses occur in saccharides of this general structure smaller than 7-mers. Not valid for pentasaccharides or lower. K: S*=En*2na*2nr*(4(n-6))*[4*(n-5)]*6 For a hexasaccharide: 46656 * 64 * 64 * 1 * 4 * 6 = 4,586,471,424 Carbohydrate IT Tetra-branched versions are also possible: ED \/ FR XXIII /\ A BC AB \/ E-FR /\ CD XXIV Carbohydrate IT Tetrabranched hexoses are completely substituted as 2,3,4,6 for pyranoses or 2,3,5,6 for furanoses. Since no other branching is possible, the original term N! for substitution permutations covers all of the possibilities except that the disaccharide on structure XXIII could occupy 4 different sites on FR creating another factor of 4 in that structure. ED \/ FR XXIII /\ A BC AB \/ E-FR XXIV /\ CD Carbohydrate IT L: S*=En*2na*2nr*(4(n-5)*[n-4])*(4(n-5)) Not valid for tetrasaccharides or below. The penultimate term [n-4] shows the number of core saccharides capable of tetrasubstitution while the last term shows substitution of the disaccharide AB on the hydroxyls of F in XXIII. In heptasaccharides and above, one could also envision a trisaccharide branch that could be inserted in the compound analogous to XXIII while a disaccharide branch would find itself in the analog to XXIV. Therefore, for higher oligosaccharides, extra terms need be added to equation L. For a hexasaccharide: 46656 * 64 * 64 * 4 * 2 * 4 = 6,115,295,232 Carbohydrate IT This set of equations seems to cover all possibilities for a hexasaccharide or smaller where F is the reducing end or is attached to an aglycon. Carbohydrate IT Hepta-, Octa-, and Nona-saccharides offer possibilities of higher orders of branching. Decasaccharides offer the first possibility of quadruple branched saccharides, A \ C /\ B D /\ E F /\ G H /\ I J-R (or three tri-branched residues): Carbohydrate IT (or three tri-branched residues): ABC | | | G-H-I-J-(reducing end) | | | DEF Carbohydrate IT Higher orders of branching can be envisioned. However, most biological activities are contained within a reasonable sized proteinaceous binding site of 6 sugars, or usually fewer as exemplified by antibodies, enzymes (lysozyme), heparinoids or lectins (selectins). There are a few examples of proteins requiring higher oligomers for activity, e.g. a few enzyme recognition sites in the N-linked anabolic pathway for glycoprotein synthesis which apparently recognize precursors as large as 14 sugars. Carbohydrate IT Taking all of the above calculations together, the total number of permutations for a hexasaccharide can be enumerated. The master equation is given as the addition of all equations A' through L: negative values obtained from calculations should be regarded as zero. Carbohydrate IT Totals taken from A' to L for hexasaccharides made up of D-hexoses: A' 195,689,447,424 B 293,534,171,136 C 146,452,512,768 D 0 E 0 F 330,225,942,528 G 36,691,771,392 H 12,230,590,464 J 27,518,828,544 K 4,586,471,424 L 6,115,295,232 Total: 1,053,045,031,000 Carbohydrate IT Without considering L sugars, or nonreducing forms the total number of compounds from a hexasaccharide comprised of 6 different hexoses will be the total of the above, more than 1012 possible compounds. Including the mirror image L sugar forms as stereochemical isomers within this set would increase this number by a factor of 64, to more than 64 trillion. Carbohydrate IT Table 3: Isomers Including Branches and Repeating Hexoses: ____________________________________________________________ _ Oligosaccharide size: Linear and Branched Isomers ____________________________________________________________ _ Monosaccharide Disaccharide Trisaccharide Tetrasaccharide Pentasaccharide Hexasaccharide 4 256 43,200 7,602,176 2,633,600,000 1,053,045,031,000 ISOMERS Oligosaccharide Isomers from D-Hexoses 10 17 10 16 10 15 10 14 10 13 10 12 10 11 10 10 10 9 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 1 10 0 Octasaccharide Isomers Exceed 10e+17 Br anche d and Line ar Oligos accharide s Line ar Oligos accharide s Pe ptide s 1 2 3 4 5 Degree of Polymerization 6 7 8 Carbohydrate IT Figure 1 shows the data from Tables 2 and 3 plotted along with data from the same length peptides. Extrapolation in Figure 1 shows that linear and branched totals for heptasaccharides would generate around 1015 compounds, and octasaccharides would generate at least 1018 Divergence from the linear forms increases from 1 log at heptamers to 2 logs at octamers. The divergence is due to an increase in branching types. A mole of isomers exists>8 Carbohydrate IT While nature has not yet confounded us with numbers of compounds of such magnitude, this brings little comfort to the analyst or synthetic chemist who must, after all, come to the conclusion that the oligosaccharide in question is, absolutely, the single correct structure out of billions. Carbohydrate IT For oligosaccharide building blocks, organisms possess a larger number of possibilities than for peptides. There exist more than 50 types of sugars without considering non-saccharidic substitutions. Sugars are found substituted with acyl, alkyl, pyruvyl, sulfate, sulfonate, phosphate, phosphonate, and other groups, any one of which would raise the possible isomers of only a singly substituted saccharide to a number higher than the one we have calculated. Carbohydrate IT That is, if one allows a single methyl group, for example, to be substituted anywhere on a hexasaccharide, 2.4 x 1013 new compounds could be envisioned. There are 24 hydroxyls free on each hexasaccharide. Therefore a factor of 24 can be multiplied to the already calculated total for hexasaccharides. Each one of these would present a different antigen to an antibody, for example. The human antibody diversity potential is estimated above 1012 A biological “code” There is a very high number of possible epitopes for establishment of a biological recognition "code" consisting of the binding pocket of a specific protein on the one hand coded as a lectin binding site the complex sugar structure on the other. The “code” is embedded as a set of sequentially acting glycosyl transferases where the “program” may be differently expressed in alternative tissues or in certain conditions. The above calculation shows the most complex known biologically recognizable chemical “code” in a short sequence yet uncovered in nature. Carbohydrate IT The set model for this project can be described as a series of convex epitopes which have a direction, that is they have one or more beginning (non-reducing terminal) termini and only one (usually aldehyde or ketone) ending terminus, The latter is conventionally called the “reducing end” and written with this "reducing" terminus monomer to the right. The set The epitopes can be populated with a set of epimeric monomers of defined size, These monomers can be linked to each other at one position on the left hand and usually 4 different positions on the right. For each of the 4 positions there is a relation above (ß) and below (a) the plane of the monomeric ring (D -forms). Each monomer can exist in a 5 or 6 sided polygon, and There is sequence order of all of these parameters. Carbohydrate IT This examination of the carbohydrate isomers is exhaustive for hexasaccharides and lower, and covers most isomers for compounds up to octasaccharides with the proviso that all possible branched compounds are to be considered and their terms are to be added. The numbers are astronomical, showing a graph that exceeds 2 logs per monomer through pentasaccharides and grows to 3 logs per monomer above heptasaccharides, The values obtained are especially surprising for such a short oligomer sequence. Biological IT Because proteins can evolve more rapidly than carbohydrates (which must have a substantial enzyme change to add a new sugar), saccharide structures are likely to be very conserved over evolution when compared with proteins whose specificity could change with a single nucleotide (thus amino acid) mutation. An example exists in the literature, however, where a few amino acid changes altered the specificity of the transferase from galactose to Nacetylgalactosamine that confers A and B blood types to humans. (work of Hakomori, et al.) Carbohydrate IT Those carbohydrate sequences in metazoans with functions that are conserved will probably be preserved across orders, such as the selectins and heparinoids in mammals. There is, obviously, requisite chemistry for much further biological evolution in carbohydrate-protein and, potentially, carbohydrate-carbohydrate and carbohydrate-nucleotide recognition systems.