The Complete Bins Dictionary Carbohydrates Bin- Any entry that contains carbohydrate moiety. Saccharides Alone Bin- Any entry containing at least one monosaccharide unit but not containing any protein, nucleic acid, and lipids. Others Bin- Any “floating” (non-covalently attached to protein) entry containing at least one monosaccharide unit that is covalently bound to another component(s) that is not an identifiable saccharide such as a modified residue or a ring complex. Non-Covalently Bound Saccharides Bin- Any “floating” (non-covalently attached to protein) entry containing at least one monosacharide. The monosaccharide may be covalently bound to another saccharide unit, forming a di- or polysaccharide. The saccharide ligands may or may not be modified. Nucleic Acid Containing Bin- Any entry containing at least one monosaccharide unit that also contain nucleic acid. DNA Bin- An entry that contains only DNA and carbohydrates. RNA Bin- An entry that contains only RNA and carbohydrates. Mixed Bin- Entries that contain DNA, RNA, carbohydrates, and/or proteins. Glycoproteins Bin- proteins that contain oligosaccharide chains (glycans) covalently attached to polypeptide side-chains. N-glycosylation and Oglycosylation with comparatively small, branched, usually unsulfated carbohydrate units, devoid of repeating units. O-Glycan Bin- Entries that contain a carbohydrate that is O-linked to the protein through the amino acid Serine and Threonine. N-Glycan Bin- Entries that contain a carbohydrate that is N-linked to the protein through the amino acid Asparagine. C-Glycan Bin- Entries that contain a carbohydrate that is C-linked to the protein through the amino acid Tryptophan. Proteoglycans Bin- proteins that are heavily glycosylated. The basic proteoglycan unit consists of a "core protein" with one or more covalently attached glycosaminoglycan (GAG) chain(s) (long unbranched polysaccharides consisting of a repeating disaccharide unit. The repeating unit consists of a hexose or a hexuronic acid linked to a hexosamine.) Ser-linked and consist of long, unbranched chains of alternating residues of hexosamine and uronic acid or galactose. Heparin/Heparan Sulfate Bin- Entries that contain at least one unit of Glycuronic acid/ iduronic acid and N-glucosamine. Chondroitin Sulfate Bin- Entries that contain at least one unit of Glucuronic acid and N-galactosamine. The GlcUA and GalN may have modifications. Hyaluronan Bin- Entries that contain at least one unit of glucuronic acid or linked to an N-glucosamine. GlcUA and GlcN may have modifications. Keratan Sulfate Bin- Entries that contain at least one unit of galactose linked to N-Acetylglucosamine. Either of these monosaccharides may be modified. Glycolipids Bin- any entry that contains both carbohydrate and lipid moieties. The term glycolipid designates any compound containing one or more monosaccharide residues bound by a glycosidic linkage to a hydrophobic moiety such as an acylglycerol, a sphingoid, a ceramide (N-acylsphingoid) or a prenyl phosphate. Non-Carbohydrate Bin- Any entry that does not contain carbohydrate moiety. Methodology of Searching Finding Saccharides Alone Advanced search. Search for Macromolecule Type. No for every option. 20 structures were found. 15 of them are GAGs. Finding Proteoglycans We used advanced search. Choose “chemical ID” for two criteria boxes, and put one ID in each box. In the drop-down menu directly to the left of the “clear all parameters” button at the bottom of the page, be sure to select “Results: Structures,” or else the searches will not yield any results. We got the search combinations from Wikipeida, from the chart in the “classifications” section of the article. http://en.wikipedia.org/wiki/Glycosaminoglycan Refer to the Proteoglycan Search Combinations spreadsheet for the combinations we used. For chondroitin sulfate, only BDP/NG6, BDPNGA, GC4/NGA, GCD/NG6, GCU/NG6, GCU/NGA, ASG/GCU, ASG/BDP, ASG/GC4, and ASG/GCD gave results. For Heparin/Heparan Sulfate, only IDS/SGN, GC4/NAG, GCD/NAG, GCU/NAG, IDS/NAG, UAP/SGN, and IDR/GNS gave results. For Keratan Sulfate, only G6S/NGS gave results. For Hyaluronan, BDP/NAG, GC4/NAG, GCD/NAG, GCU/NAG, IDS/NAG, IDS/SGN, and UAP/SGN. We also searched the chemical component dictionary for disaccharide units, using the SMILES/SMARTS option in an advanced search. In the first criteria box, draw the structure: SMILES string: OC(=O)C1-,=C-,=C-,=C-,=CO1 In the second criteria box, draw the structure: SMILES string: [#6]C(=O)NC1-,=C-,=C-,=C-,=CO1 We also did name searches because some of the entries might not have the regular carbohydrates components (hyaluronan, chondroitin, chondroitin sulfate, dermatan sulfate, keratan, keratan sulfate, heparin, heparan sulfate). Finding Glycoproteins: 1. Refer to the spreadsheet Ser.xls to find Ser-linked O-glycans. 2. Delete columns B through K, P through S, and U through the end for easier reading. 3. Look at the two IDs listed in columns B and C. If one of them is SER, look at columns D and E, which list the atoms involved in each linkage. If one of these atoms is an O and one of them is a C, it is a SER-linked O-glycan. This can be confirmed by looking at the PDB entry on the RCSB website, opening the PDB file, and looking at the link records. 4. Sometimes, for cases where one of the ligands involved is a Serine, the atoms might be O and N, or O and O. These may be mistakes and need to be checked on Chimera. Any of these instances have been annotated on the website. 5. O-glycans can also be linked to Threonine. Refer to the spreadsheet Thr.xls to find these. 6. Again, delete the columns B through K, P through S, and U through the end. 7. Repeat step 3 and 4, except look for the amino acid THR. 8. The N-glycans are found in the spreadsheet Asn.xls. Repeat step 6. 9. Look in columns B and C for the amino acid asparagines (ASN.) If ASN is one of them, look at columns D and E for the atoms N and C. These are N-glycans. If the atoms are N and N, or N and O, check in Chimera. All PDB entries that contain SER, THR, or ASN in their link records are contained in these spreadsheets. The spreadsheets are a summary of those link records. However, any PDB entry can be checked to see if it contains a glycoprotein without use of the spreadsheets. Simply look up the entry in the RCSB website, view the PDB file, and use “ctrl + F” to find the link record section. If SER or THR is O-linked to a sugar, or if ASN is N-linked to a sugar, it is a glycoprotein. Sugars can be linked to the protein through amino acids other than SER, THR, or ASN, but these are listed in the “other” bin and annotated according to which amino acid they are linked to. Sialic acid search — 1. searched by name. —Sialic Acid —6 ligands —150 entries Neuraminic Acid 14 ligands —70 entries 2. then searched by structure. Refer to Sialic Acid ligand. Finding Glycolipids: Reference: http://en.wikipedia.org/wiki/Glycolipid We initially conducted a simple SMILES search using the string [#6]~[#6]~[#6]~[#6]~[#6]~O[#6]~1~[#6]~[#6]~[#6]~[#6]~[#8]~1. We sorted through the results one by one, deciding whether an entry was in fact a glycolipid. A lipid is defined by its hydrophobicity and this was taken into account when sorting through the results. This SMILES search, however, excluded results which contained sugar and lipid moieties in two or more separate ligands, limiting the search yield. For this reason, we used two separate SMILES strings [#6]~1~[#6]~[#6]~[#8]~[#6]~[#6]~1 and [#6]~[#6]~[#6]~[#6]~[#6]~[#6]~[#6]~[#8] in the same search to find all chemical components that contained both a hydrocarbon chain and sugar unit. Some results yielded glycolipids in one single ligand while others were broken apart into separate sugar and lipid ligands. To make this list of results more manageable, we subtracted the definite glycolipids from the first SMILES search from this new list of results, giving us a smaller number of entries to sort through. We looked into each entry’s PDB file to make sure that the lipid and sugar components were in fact linked. NOTE: this kind of search excluded glycolipids that had hydrocarbon chains less than 7 carbons long. These glycolipids were found later when sorting through the unclassified lists. We also did a SMILES search for steroids with the strings [#6]~1~[#6]~[#6]~2~[#6]~[#6]~[#6]~3~[#6](~[#6]~[#6]~[#6]~4~[#6]~[#6]~[#6]~[#6]~[#6]~3~4)~[# 6]~2~[#6]~1 and [#6]~1~[#6]~[#6]~[#8]~[#6]~[#6]~1. As before, we sorted through all results to make sure the steroid was covalently bound to a sugar. We began trying to subclassify the glycolipids we found but this proved very difficult. The three major classes of glycolipids (glycoglycrolipids, glycosphingolipids, and GPI anchors) were relatively uncommon. The most common type of glycolipid was a detergent like molecule like the ligand BOG which cannot be classified into the major lipid categories. This molecule helps make a membrane more pliable for studies in biochemistry and is found in many cyclooxygenase and cytochrome containing entries. Although we did not get to it, it may be useful to begin analyzing glycolipids according to their functions in different molecules and from there, find patterns in their molecular structure to be able to subclassify them. Classes of molecules that may be interesting to look into include: CD1 lipid antigen presenting molecules, cytochrome molecules, porin molecules, cyclooxygenase molecules, and pigment molecules. Finding non-covalently bound saccharides, “other,” and straight chain sugars: We began with the list we generated with no link records and went through each entry individually to decide whether to classify it as non-covalently bound saccharide (modified or unmodified), other, or straight chain sugar. To determine this, we looked solely at each entry’s ligands. After we finished the no-link list, we proceeded to do the same type of classification with the remainder list. To determine which category each entry belonged to for this list, we had to look at the PDB file link records to see how the ligands were linked, if they were linked at all. Annotations of the entries in Max’s website: Modified Non-Covalently Bound Monosaccharides These are annotated according to which functional groups they contain. The following terms were used to annotate are included in the Annotations to Modified Polysaccharides spreadsheet. However, many of these terms are repetitive (such as F, flourine, flouro), so the annotations are not complete. We are in the process of going through them and replacing repetitive terms with matching ones, so someone can easily search for the modification they are looking for. Column A of the spreadsheet contains the terms we started out with, and Column D contains the incomplete list of more refined and specific terms. Antibiotics Any entry that contains a carbohydrate ligand that can act as an antibiotic is annotated as “antibiotic.” They are found in the Nucleic Acid and Other bins. Other In addition to being labled as antibiotics, some of the “other” entries contain sugars that are liked to the protein but are not glycoproteins. They are annotated according to which amino acid(s) they are attached to. For example: GLU, CYS, LYS, ASP. Glycoproteins In the “mistake N-glycans” bin, the annotations contain a description of the error. In the “O-glycans” bin, there are several entries that contain errors, and those are described in the annotations. Also, if a link is something either than C-O, it is annotated (as O-N or O-O). Proteoglycans These are annotated according to the combinations of chemical IDs we searched for to find each particular entry.