PubChem PubChem (1) is designed to provide information on biological activities of small molecules, generally those with molecular weight less than 500 daltons(2). PubChem's integration with NCBI's Entrez (3) information retrieval system provides sub/structure, similarity structure, bioactivity data as well as links to biological property information in PubMed and NCBI's Protein 3D Structure Resource. PubChem Databases PubChem is comprised of three linked databases -PubChem Compound, PubChem Substance and PubChem Bioassay PubChem Compound (unique structures with computed properties) PubChem Compound (4) is a searchable database of chemical structures with validated chemical depiction information provided to describe substances in PubChem Substance. Structures stored within PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups. PubChem Compound includes over 5M compounds. Molecular Name Searches (e.g., Tylenol, Benzene) allow searching with a variety of chemical synonyms, Chemical Property Range Searches (e.g., Molecular Weight between 100 and 200, Hydrogen Bond Acceptor Count between 3 and 5) allow searching for compounds with a variety of physical/chemical properties, and descriptors. Simple Elemental Searches (all compounds containing Gallium) allow searching with specific element restrictions. PubChem Substance (deposited structures) PubChem Substance (5) is a searchable database containing descriptions of chemical samples, from a variety of sources, and links to PubMed citations, protein 3D structures, and biological screening results available in PubChem BioAssay. PubChem Substance includes over 8M records. Substances with known content are linked to PubChem Compound. Molecule Synonym Searches (e.g. all substances with 'deoxythymidine' as a name fragment, or substances that contain 3'-Azido-3'-deoxythymidine). Biology Links Search (e.g. substances with tested, active or inactive bioassays). Combined Searches (e.g. substances that are 'Active in any BioAssay' and contain the element Ruthenium). PubChem BioAssay PubChem BioAssay (6) is a searchable database containing bioactivity screens of chemical substances described in PubChem Substance. PubChem BioAssay includes over 180 bioassays. Searchable descriptions of each bioassay are provided that include descriptions of screening procedural conditions and readouts. To Search for BioAssay Data Sets (e.g. HIV growth inhibition). To Browse or Download PubChem BioAssay Results (NCI AIDS Antiviral Assay) Searching PubChem PubChem Text Search PubChem Text Search for searching compound name, synonym or ID that defaults to PubChem Compound. The search results page offers a pull down 'databases' menu that allows searching in PubChem Substance, PubChem BioAssay and a variety of other Entrez databases. PubChem Chemical Structure Search PubChem Chemical Structure Search (7) has the following options: Search SMILES (including SMARTS or InChI) or Formula which includes a 'Sketch' link to a drawing program that converts structural diagrams to SMILES(exact), SMARTS(substructure) or InChI(exact) strings for searching. Clicking 'Done' on the 'structure editor' converts the structural diagram to the appropriate string and transfers it to the search box. Select Structure File allows importation of standard and common chemical file formats (8). Specify Search Type allows restriction to: same compound, similar compounds (9), formula or substructure. PubChem Indexes and Index Search PubChem Indexes and Index Search allows fielded/range searching from either the PubChem homepage or Entrez search page. A extensive list of field aliases and examples of range searching is provided (10). PubChem Search Results (11) PubChem Compound PubChem Compound results are derived from PubChem Substance records that provide structures. Since compounds are structurally unique, one compound may link to multiple substances. The default display is a compound summary with thumbnails with cross links(12) to each PubChem database, other NCBI databases, and depositor's databases. Clicking either the structure or SID link gives the full display which includes the compound's property data, description, related substance information, neighboring structures, and cross links. PubChem Substance PubChem Substance has unique records if the structure is not known or supplied. For example, Sulfated polymannuroguluronate, a novel anti-acquired immune deficiency syndrome (AIDS) drug candidate, and other natural products. The PubChem Substance Summary Record, --------------------------------------------------------------------------------------------------------SID: 3724242 Links Sulfated polymannuroguluronate, AIDS218087 ... Source: NIAID(218087) --------------------------------------------------------------------------------------------------------is linked to the full record by clicking on the SID number (PubChem's substance identifier). This displays the full substance record, that includes links: to PubMed and the source; the Medical Subject Annotation (MESH Substance Name) and a MESH PubMed search link; and depositor supplied synonyms and comments. PubChem BioAssay The PubChem BioAssay Summary Record, ------------------------------------------------------------------------------AID: 179 NCI AIDS Antiviral Assay Source: DTP/NCI 15 Readouts, 37678 substances tested ------------------------------------------------------------------------------- Links is linked to the full record by clicking on the AID number (PubChem's assay (protocol) identifier). This displays the full bioassay record, that includes: links to the substances tested (all, active, inactive, inconclusive) and related PubMed, Protein, Taxonomy, OMIM and related BioAssay records; and a description of the assay possibly with protocols and comments. References: 1a. PubChem http://pubchem.ncbi.nlm.nih.gov/ 1b. PubChem - Overview http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_Overview 1c. PubChem FAQ http://pubchem.ncbi.nlm.nih.gov/help.html#faq 1d. PubChem Glossary http://pubchem.ncbi.nlm.nih.gov/help.html#Glossary 1e. PubChem - Help http://pubchem.ncbi.nlm.nih.gov/help.html "Provides tips and examples for searches of the three PubChem databases by text term/keyword, as well as tips for searching PubChem Compound by chemical properties. The help documents for structure search provide tips on using chemical information for basic and advanced structure search options in the PubChem Structure Search." 2. NIH Roadmap for Medical Research. Molecular Libraries and Imaging. http://nihroadmap.nih.gov/molecularlibraries 3. Entrez Databases http://www.ncbi.nlm.nih.gov/About/tools/restable_mol.html 4a. PubChem Compound http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pccompound "Compound -- Chemical representatives in a substance. Chemical structure presented in a compound is standardized through PubChem's data pipeline. A mixture substance may have several standardized compounds." Since compounds are structurally unique, one compound may link to many substances. CID is PubChem's compound identifier. 4b. PubChem Compound Database - search examples http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_Compound_Database 5a. PubChem Substance http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pcsubstance "Substance -- Individual record object collected from depositors, representing a sample used at bioassay." 5b. PubChem Substance Database - search examples http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_Substance_Database 6. PubChem BioAssay http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pcassay 6b. PubChem BioAssay Database - search and display examples http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_BioAssay_Database 7a. PubChem Structure Search http://pubchem.ncbi.nlm.nih.gov/search/ 7b. PubChem Structure Search Help http://pubchem.ncbi.nlm.nih.gov/search/help_simplesearch.htm 7c. PubChem Advanced Structure Search Help http://pubchem.ncbi.nlm.nih.gov/search/help_search.htm 8. PubChem Structure Search Help. Upload Query File http://pubchem.ncbi.nlm.nih.gov/search/help_simplesearch.htm#EntrezTerm "Most (if not all) standard and common chemical file formats may be used, including "MOL", "SDF" (both v2000 and v3000), "CDX", "SKC", "MOL", "MOL2", "JME", and "SK2". You may also use a text file with your choice of a "SMILES", "SMARTS", or "SLN" string." 9. PubChem Help. Similar Compounds/Substances Link. http://pubchem.ncbi.nlm.nih.gov/help.html#xSimilar_Compounds "The different percent similarities are determined using a Tanimoto score relative to the "binary fingerprint" calculated for two different chemical structures." 10. PubChem Indexes and Index Search http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index 11. PubChem Summary Display http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_Summary 12. PubChem Cross Links http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_Links >1) Go to the PubChem "Compound" or "Substance" pages, depending on whether >you want unique structure records only or all deposited structures. The URLs >are: > >http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pccompound (1 hit for the >InChI below) > >http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pcsubstance >(5 hits for the InChI below) > >2) Paste in your InChI, for example: > >"InChI=1/C17H14O4S/c1-22(19,20)14-9-7-12(8-10-14)15-11-21-17(18)16(15)13-5-3 >-2-4-6-13/h2-10H,11H2,1H3" > >Note that the QUOTES ARE REQUIRED, and there must be no carriage return or >line feed in the string, despite appearances on this email. Note also that >the current text query system does not actually recognize the numbers and >punctuation characters, just the count. It seems to identify the correct >structures most all the time, nonetheless. I am told a proper recognition >system for InChI's (a structure decoder) is in the works. Anyone >interested in this should contact NLM/NCBI directly. Entrez cross-database search page Titled: "Entrez: The life sciences search engine" this page is a useful gateway for searching all NBCI databases including PubMed, PubChem, Genome projects and more. Users can enter terms and click 'GO' to run the search against ALL the databases, OR Click Database Name or Icon to go directly to the Search Page for that database, OR click Question Mark for a short explanation of that database. http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi http://www.emolecules.com/doc/index.htm eMolecules discovers sources of chemical data by searching the internet, and receives submissions from data providers such as chemical suppliers and academic researchers. This is the most comprehensive overhaul of eMolecules since our launch in November 2005. We rebuilt the eMolecules database from the ground up, starting from updated databases and catalogs. Then we added four million new entries from dozens of new sources. In addition, the results pages were redesigned for a cleaner, more compact presentation. Over 5.5 million unique molecules from over 16 million sources Over 500,000 CAS numbers Over 100 chemical suppliers New government and academic databases, such as NIST and NCI, with direct links to their data Tens of thousands of trade names and common names Over 2 million IUPAC and other names We provide links to real molecules, those available for purchase, and to chemical properties databases with real information, whenever possible, and without ambiguity in the stereochemistry.