SMILES • Simplified Molecular Input Line Entry System (SMILES) • Widely used AND computationally efficient • Uses atomic symbols and a set of intuitive rules • Uses hydrogen-suppressed molecular graphs (HSMG) SMILES Bonds SINGLE* - DOUBLE = TRIPLE # AROMATIC* * can be omitted : Butanols O 2-Butanol iso-Butanol tert-Butanol O O SMILES Branches • Represented by enclosure in parentheses • Can be nested or stacked • Examples: CC(O)CC is 2-Butanol OCC(C)C is iso-Butanol OC(C)(C)C is tert-Butanol SMILES Bonds Ethene Chloroethene 1,1-Dichloroethene cis-1,2-Dichloroethene Trichloroethene Perchloroethene C=C ClC=C ClC(Cl)=C ClC=CCl ClC(Cl)=CCl ClC(Cl)=C(Cl)Cl SMILES Atoms • Use normal chemical symbols • Add punctuation symbols if necessary • No super- or subscripts SMILES Symbols • String of alphanumeric characters and certain punctuation symbols • Terminates at the first space encountered when read left to right • The ORGANIC SUBSET: B, C, N, O, P, S, F, Cl, Br, I Other SMILES Atoms • Aliphatic or nonaromatic carbon: C • Atom in aromatic ring: lowercase letter • Designate ring closure with pairs of matching digits, e.g. c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereas C1CCCCC1 is Cyclohexane SMILES Charges • Specify attached hydrogens and charges in square brackets • Number of attached hydrogens is the symbol H followed by optional digit SMILES Charges [H+] [OH-] [OH3+] [Fe++] [NH4+] proton hydroxyl anion hydronium cation iron(II) cation ammonium cation SMILES Cyclic Structures • Break one single or one aromatic bond in each ring • Number in any order – Designate ring-breaking atoms by the same digit following the atomic symbol Cyclic Structures • Numbers indicate start and stop of ring • Same number indicates start and end of the ring, entered immediately following the start/end atoms • Only numbers 1 – 9 are used • A number should appear only twice • Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2 Naphthalene c12ccccc1cccc2 SMILES Conventions • Avoid two consecutive left parentheses if possible • Strive for the fewest number of possible branches • Tautomeric bonds are not designated; enter the appropriate form Further Restrictions • A branch cannot begin a SMILES notation • A branch cannot immediately follow a double- or triple-bond symbol • Example: C=(CC)C is invalid, but • C(=CC)C or C(CC)=C are valid SMILES SMILES Fragments • • • • • • • Nitro Nitrate Nitrite Sulfonic acid Cyanide/Nitrile Azide Azido • • • • • • • N(=O)(=O) ON(=O)(=O) ON(=O) S(=O)(=O)O C#N N=N#N N+=N- SMILES Metals [Al] [As] [Au] [Be] [Bi] [Cd] [Ca] [Fe] [Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb] [Sn] [Zn] [Zr] Disconnected Structures • Indicated by a dot • Tetramethyl ammonium bromide C[N+]C(C)C.[Br-] Isomeric and Chiral SMILES • Isomeric configuration indicated by forward and backward slashes: / \ • Examples: – trans-1,2-dibromoethene: Br/C=C/Br • Direction of the slash continues – cis-1,2-dibromoethene: Br/C=C\Br • Direction of the slash reverses • Chirality indicated by the “@” symbol Some Applications • JMDraw/SMILESViewer (Christoph Steinbeck) • JME Molecular Editor (Peter Ertl) • STN Express (SMILES as output) • Tripos (dbtranslate: SMILES to MOL) • Marvin (Ferenc Csizmadia) http://chemaxon.com/marvin/ • CACTVS http://www2.ccc.uni-erlangen.de/cactvs/ Another Application • SMILESCAS Database http://www.syrres.com/esc/smilecas.htm Over 103,000 SMILES notations • Input CAS Registry Number • Leads to SMILES and thence to a structure search