The Complete Bins Dictionary

advertisement
The Complete Bins Dictionary
Carbohydrates Bin- Any entry that contains carbohydrate moiety.
Saccharides Alone Bin- Any entry containing at least one monosaccharide unit
but not containing any protein, nucleic acid, and lipids.
Others Bin- Any “floating” (non-covalently attached to protein) entry containing
at least one monosaccharide unit that is covalently bound to another
component(s) that is not an identifiable saccharide such as a modified residue or
a ring complex.
Non-Covalently Bound Saccharides Bin- Any “floating” (non-covalently
attached to protein) entry containing at least one monosacharide. The
monosaccharide may be covalently bound to another saccharide unit, forming a
di- or polysaccharide. The saccharide ligands may or may not be modified.
Nucleic Acid Containing Bin- Any entry containing at least one
monosaccharide unit that also contain nucleic acid.
DNA Bin- An entry that contains only DNA and carbohydrates.
RNA Bin- An entry that contains only RNA and carbohydrates.
Mixed Bin- Entries that contain DNA, RNA, carbohydrates, and/or
proteins.
Glycoproteins Bin- proteins that contain oligosaccharide chains (glycans)
covalently attached to polypeptide side-chains. N-glycosylation and Oglycosylation with comparatively small, branched, usually unsulfated
carbohydrate units, devoid of repeating units.
O-Glycan Bin- Entries that contain a carbohydrate that is O-linked to the
protein through the amino acid Serine and Threonine.
N-Glycan Bin- Entries that contain a carbohydrate that is N-linked to the
protein through the amino acid Asparagine.
C-Glycan Bin- Entries that contain a carbohydrate that is C-linked to the
protein through the amino acid Tryptophan.
Proteoglycans Bin- proteins that are heavily glycosylated. The basic
proteoglycan unit consists of a "core protein" with one or more covalently
attached glycosaminoglycan (GAG) chain(s) (long unbranched polysaccharides
consisting of a repeating disaccharide unit. The repeating unit consists of a
hexose or a hexuronic acid linked to a hexosamine.) Ser-linked and consist of
long, unbranched chains of alternating residues of hexosamine and uronic acid
or galactose.
Heparin/Heparan Sulfate Bin- Entries that contain at least one unit of
Glycuronic acid/ iduronic acid and N-glucosamine.
Chondroitin Sulfate Bin- Entries that contain at least one unit of
Glucuronic acid and N-galactosamine. The GlcUA and GalN may have
modifications.
Hyaluronan Bin- Entries that contain at least one unit of glucuronic acid
or linked to an N-glucosamine. GlcUA and GlcN may have modifications.
Keratan Sulfate Bin- Entries that contain at least one unit of galactose
linked to N-Acetylglucosamine. Either of these monosaccharides may be
modified.
Glycolipids Bin- any entry that contains both carbohydrate and lipid moieties.
The term glycolipid designates any compound containing one or more
monosaccharide residues bound by a glycosidic linkage to a hydrophobic moiety
such as an acylglycerol, a sphingoid, a ceramide (N-acylsphingoid) or a prenyl
phosphate.
Non-Carbohydrate Bin- Any entry that does not contain carbohydrate moiety.
Methodology of Searching
Finding Saccharides Alone
Advanced search. Search for Macromolecule Type. No for every option.
20 structures were found. 15 of them are GAGs.
Finding Proteoglycans
We used advanced search. Choose “chemical ID” for two criteria boxes, and put
one ID in each box. In the drop-down menu directly to the left of the “clear all
parameters” button at the bottom of the page, be sure to select “Results: Structures,” or
else the searches will not yield any results.
We got the search combinations from Wikipeida, from the chart in the
“classifications” section of the article.
http://en.wikipedia.org/wiki/Glycosaminoglycan
Refer to the Proteoglycan Search Combinations spreadsheet for the combinations we
used. For chondroitin sulfate, only BDP/NG6, BDPNGA, GC4/NGA, GCD/NG6, GCU/NG6,
GCU/NGA, ASG/GCU, ASG/BDP, ASG/GC4, and ASG/GCD gave results.
For Heparin/Heparan Sulfate, only IDS/SGN, GC4/NAG, GCD/NAG, GCU/NAG, IDS/NAG,
UAP/SGN, and IDR/GNS gave results. For Keratan Sulfate, only G6S/NGS gave results. For
Hyaluronan, BDP/NAG, GC4/NAG, GCD/NAG, GCU/NAG, IDS/NAG, IDS/SGN, and UAP/SGN.
We also searched the chemical component dictionary for disaccharide units,
using the SMILES/SMARTS option in an advanced search. In the first criteria box, draw
the structure:
SMILES string: OC(=O)C1-,=C-,=C-,=C-,=CO1
In the second criteria box, draw the structure:
SMILES string: [#6]C(=O)NC1-,=C-,=C-,=C-,=CO1
We also did name searches because some of the entries might not have the
regular carbohydrates components (hyaluronan, chondroitin, chondroitin sulfate,
dermatan sulfate, keratan, keratan sulfate, heparin, heparan sulfate).
Finding Glycoproteins:
1.
Refer to the spreadsheet Ser.xls to find Ser-linked O-glycans.
2.
Delete columns B through K, P through S, and U through the end for easier
reading.
3.
Look at the two IDs listed in columns B and C. If one of them is SER, look at
columns D and E, which list the atoms involved in each linkage. If one of these atoms is
an O and one of them is a C, it is a SER-linked O-glycan. This can be confirmed by
looking at the PDB entry on the RCSB website, opening the PDB file, and looking at the
link records.
4.
Sometimes, for cases where one of the ligands involved is a Serine, the atoms
might be O and N, or O and O. These may be mistakes and need to be checked on
Chimera. Any of these instances have been annotated on the website.
5.
O-glycans can also be linked to Threonine. Refer to the spreadsheet Thr.xls to
find these.
6.
Again, delete the columns B through K, P through S, and U through the end.
7.
Repeat step 3 and 4, except look for the amino acid THR.
8.
The N-glycans are found in the spreadsheet Asn.xls. Repeat step 6.
9.
Look in columns B and C for the amino acid asparagines (ASN.) If ASN is one of
them, look at columns D and E for the atoms N and C. These are N-glycans. If the
atoms are N and N, or N and O, check in Chimera.
All PDB entries that contain SER, THR, or ASN in their link records are contained
in these spreadsheets. The spreadsheets are a summary of those link records.
However, any PDB entry can be checked to see if it contains a glycoprotein without use
of the spreadsheets. Simply look up the entry in the RCSB website, view the PDB file,
and use “ctrl + F” to find the link record section. If SER or THR is O-linked to a sugar, or
if ASN is N-linked to a sugar, it is a glycoprotein.
Sugars can be linked to the protein through amino acids other than SER, THR, or
ASN, but these are listed in the “other” bin and annotated according to which amino acid
they are linked to.
Sialic acid search
—
1. searched by name.
—Sialic Acid
—6 ligands
—150 entries
Neuraminic Acid
14 ligands
—70 entries
2. then searched by structure.
Refer to Sialic Acid ligand.
Finding Glycolipids:
Reference: http://en.wikipedia.org/wiki/Glycolipid
We initially conducted a simple SMILES search using the string
[#6]~[#6]~[#6]~[#6]~[#6]~O[#6]~1~[#6]~[#6]~[#6]~[#6]~[#8]~1. We sorted through the results
one by one, deciding whether an entry was in fact a glycolipid. A lipid is defined by its
hydrophobicity and this was taken into account when sorting through the results. This SMILES
search, however, excluded results which contained sugar and lipid moieties in two or more
separate ligands, limiting the search yield.
For this reason, we used two separate SMILES strings [#6]~1~[#6]~[#6]~[#8]~[#6]~[#6]~1 and
[#6]~[#6]~[#6]~[#6]~[#6]~[#6]~[#6]~[#8] in the same search to find all chemical components that
contained both a hydrocarbon chain and sugar unit. Some results yielded glycolipids in one
single ligand while others were broken apart into separate sugar and lipid ligands. To make this
list of results more manageable, we subtracted the definite glycolipids from the first SMILES
search from this new list of results, giving us a smaller number of entries to sort through. We
looked into each entry’s PDB file to make sure that the lipid and sugar components were in fact
linked.
NOTE: this kind of search excluded glycolipids that had hydrocarbon chains less than 7 carbons
long. These glycolipids were found later when sorting through the unclassified lists.
We also did a SMILES search for steroids with the strings
[#6]~1~[#6]~[#6]~2~[#6]~[#6]~[#6]~3~[#6](~[#6]~[#6]~[#6]~4~[#6]~[#6]~[#6]~[#6]~[#6]~3~4)~[#
6]~2~[#6]~1 and [#6]~1~[#6]~[#6]~[#8]~[#6]~[#6]~1. As before, we sorted through all results to
make sure the steroid was covalently bound to a sugar.
We began trying to subclassify the glycolipids we found but this proved very difficult. The three
major classes of glycolipids (glycoglycrolipids, glycosphingolipids, and GPI anchors) were
relatively uncommon. The most common type of glycolipid was a detergent like molecule like
the ligand BOG which cannot be classified into the major lipid categories. This molecule helps
make a membrane more pliable for studies in biochemistry and is found in many
cyclooxygenase and cytochrome containing entries. Although we did not get to it, it may be
useful to begin analyzing glycolipids according to their functions in different molecules and from
there, find patterns in their molecular structure to be able to subclassify them. Classes of
molecules that may be interesting to look into include: CD1 lipid antigen presenting molecules,
cytochrome molecules, porin molecules, cyclooxygenase molecules, and pigment molecules.
Finding non-covalently bound saccharides, “other,” and straight chain sugars:
We began with the list we generated with no link records and went through each entry
individually to decide whether to classify it as non-covalently bound saccharide (modified or
unmodified), other, or straight chain sugar. To determine this, we looked solely at each entry’s
ligands.
After we finished the no-link list, we proceeded to do the same type of classification with the
remainder list. To determine which category each entry belonged to for this list, we had to look
at the PDB file link records to see how the ligands were linked, if they were linked at all.
Annotations of the entries in Max’s website:
Modified Non-Covalently Bound Monosaccharides
These are annotated according to which functional groups they contain. The
following terms were used to annotate are included in the Annotations to Modified
Polysaccharides spreadsheet. However, many of these terms are repetitive (such as F,
flourine, flouro), so the annotations are not complete. We are in the process of going
through them and replacing repetitive terms with matching ones, so someone can easily
search for the modification they are looking for. Column A of the spreadsheet contains
the terms we started out with, and Column D contains the incomplete list of more
refined and specific terms.
Antibiotics
Any entry that contains a carbohydrate ligand that can act as an antibiotic is
annotated as “antibiotic.” They are found in the Nucleic Acid and Other bins.
Other
In addition to being labled as antibiotics, some of the “other” entries contain
sugars that are liked to the protein but are not glycoproteins. They are annotated
according to which amino acid(s) they are attached to. For example: GLU, CYS, LYS,
ASP.
Glycoproteins
In the “mistake N-glycans” bin, the annotations contain a description of the error.
In the “O-glycans” bin, there are several entries that contain errors, and those are
described in the annotations. Also, if a link is something either than C-O, it is annotated
(as O-N or O-O).
Proteoglycans
These are annotated according to the combinations of chemical IDs we searched
for to find each particular entry.
Download