Cheminformatics OLCC Draft syllabus – Jordi Cuadros (20141213), based on Belford, “R. Cheminformatics OLCC Topics 3” Intended audience Undergraduate chemistry students (probably not freshman students) Expected previous experience Some use of common desktop applications and web services at standard user level. A semester of organic chemistry or similar and some chemistry lab experience. Objectives of the course At the end of the course, the students: will know and be able to use the most common formats which are used to store, transform and manage chemical information in digital environments; will have a basic knowledge of the most common software tools and webservices (at least, those easily available) that current chemists and chemistry researchers use. The course will not use/require any computational skill (no programming) nor will make use of any chemistry computational tools. It is intended to address what any chemist or chemistry student should know to use current digital chemistry information resources. Syllabus Module 1. Cheminformatics: history and goals This first module of the course should provide an introduction to the topic of cheminformatics, including a brief historical introduction and an overview of what will be in the course. 1. Brief historical introduction to cheminformatics 2. Major topics on cheminformatics 2.1. Chemical entities and chemical data 2.2. Chemistry information resources Module 2. Chemists using common desktop applications This module should provide some background information on file formats and enter into some parts of desktop applications that chemists may use more extensive than other users. Common plugins/add-ons to desktop applications for chemistry should come here. 1. Understanding common files and formats 1.1. Saving information in files 1.1.1.Text files versus binary files 1.1.2.Open versus proprietary format 1.2. Desktop application files 1.2.1.Word processor 1.2.2.Worksheet applications 1.2.3.Presentation applications 1.3. Graphical information 1.3.1.Bitmap files 1.3.1.1. Compressed formats 1.3.2.Vector graphics 1.3.3.Animations 1.3.4.3D formats 1.4. Web files 2. Chemists use of desktop application 2.1. Making charts 2.2. Adding symbols and equations 2.3. Managing references 2.4. Chemistry add-ons 2.4.1.Chem4Word (http://chem4word.codeplex.com/releases/view/132214) 2.4.2.Chemistry Formatter (http://christopherking.name/ChemFormat/index.html) 2.4.3.JChem for Office (http://www.chemaxon.com/products/jchem-for-office/) 2.4.4.Dotmatics for Office (http://www.dotmatics.com/products/dotmatics-foroffice/) 2.4.5.Insight for Excel (http://accelrys.com/micro/insight/insight-for-excel.html) 2.4.6.TouchMol for Office (http://www.scilligence.com/web/touchmol.aspx) 2.4.7.Others??? Module 3. Identifying chemical entities I think that reference must be made to drawing conventions and how they dramatically influence computer representation of the compound. For example, IUPAC drawing standards for compounds (Brecher’s document representing conventions) plus stereoconfiguration representation. Fischer representations, sugar depictions, Mol 2000 vs mol 3000 limitations 1. Chemical entity identifiers 1.1. Names – IUPAC, CAS, Beilstein, Variability of systematic names based on settings, PIN names 1.2. Formulas 1.2.1.Markush structures 1.3. Line notations 1.3.1.SMILES and related notation 1.3.1.1. SMILES 1.3.1.2. SSMILES 1.3.1.3. Canonical versions of SMILES – variability of canonicalizers 1.3.1.4. SMARTS 1.3.1.5. SLN 1.3.2.International Chemical Identifier (InChI)- – limitations, layer definitions, stereo communication, 1.3.3.Other: ROSDAL, WLN… 1.4. Non-readable identifiers 1.4.1.Registry numbers 1.4.1.1. CAS RN - check digit 1.4.1.2. CSID 1.4.1.3. PubChemID 1.4.1.4. Others??? Beilstein IDs 1.4.2.InChIKey should be connected to InChI directly not separated. How it can be used on databases. 1.5. Others??? Module 4. Presenting chemistry in 2D 1. Using molecular editors 1.1. ACD/ChemSketch 1.2. Accelrys Draw 1.3. ChemDraw 1.4. Others: JChemPaint, JSME, SketchEI, ChemDoodle… 2. Common file formats for 2D representation of chemical entities 2.1. MDL Mol file 2000 vs 3000 2.2. CML 2.3. Others??? OpenBabel References http://www.gunda.hu/dprogs/ Module 5. Presenting chemistry in 3D 1. Using visualization tools for chemical entities 1.1. Jmol and JSmol 1.2. VMD 1.3. Maestro 1.4. OpenStructure 1.5. Chimera 1.6. Rasmol 1.7. Other??? 2. Common file formats for 3D representation of chemical entities 2.1. MDL Mol 2.2. CML 2.3. PDB 2.4. XYZ 2.5. Others??? References http://mariovalle.name/ChemViz/tools.html http://en.wikipedia.org/wiki/List_of_molecular_graphics_systems Module 6. Comparing and searching chemical entities 1. Using chemical entities databases 1.1. ChemSpider 1.2. PubChem 1.3. NIST Chemistry WebBook 1.4. ZINC 1.5. ChemExper 1.6. ChemIDPlus 1.7. ChEBI 1.8. Others??? 2. Understanding chemical searches 2.1. Exact search 2.2. Substructure search 2.3. Similarity search 2.3.1.Fingerprints 2.3.2.Distance measurements 2.4. Virtual screening 3. Using the databases programmatically 3.1. APIs 3.2. Web scraping References http://depth-first.com/articles/2011/10/12/sixty-four-free-chemistry-databases/ http://www.dmoz.org/Science/Chemistry/Chemical_Databases/ http://cds.rsc.org/externalresources.asp Module 7. Representing and searching chemical reactions 1. Using chemical reaction databases 1.1. WebReactions 1.2. CASREACT 1.3. Rhea 1.4. Others??? SPRESI, Methods of Organic Synthesis, Reaxys 2. CAOS applications? References: http://www.organicworldwide.net/content/reaction-databases Module 8. Representing and managing digital spectra 1. Applications to PROCESS and review spectral data – ACD/NMR Processor, MestreLabs MNova 2. Using applications to visualize and analyze spectral data 2.1. JSpecView as Java Applet plus the Javascript version 2.2. ChemDoodle Spectral viewer 2.3. JDXView 2.4. ACD/Labs spectral viewer 2.5. Others??? 3. Spectra databases 3.1. SDBS 3.2. Learn Chemistry Spectraschool 3.3. ChemSpider 3.4. NMRDB 3.5. NIST Chemistry Webbook 3.6. Others??? 4. Common file formats for chemical spectra – binary vs standard file formats – loss of information – phasing, referencing, analysis etc. 4.1. JCAMP-DX 4.2. aniML 4.3. CML 4.4. Others??? MZML for MassSpec, SPC for IR – complexities of multi-dimensional data handling, assigning data for structure-spectra correlations Module 9. Identifying chemistry information resources 1. Documentation identifiers 1.1. ISBN, ISSN, ISAN 1.2. DOI 1.3. URI 2. Author’s identifiers 2.1. ORCID 2.2. ResearcherID 2.3. Others??? 3. Managing resources with applications and webservices 3.1. Mendeley 3.2. Zotero 3.3. EndNote 3.4. Refworks 3.5. Others??? 4. Searching chemical information 4.1. SciFinder 4.2. ChemSPider 4.3. PubMed / PubChem 4.4. Web of Science 4.5. Scopus 4.6. Google Scholar 4.7. Others??? 5. Classification of chemical information and the semantic web 5.1. What is the semantic web? 5.2. Keywords and metadata 5.2.1.Metadata formats 5.2.1.1. Dublin core 5.2.1.2. Others??? 5.2.2.How to add metadata to digital resources 5.3. Thesauri and other controlled vocabularies 5.4. Ontologies 5.4.1.Web Ontology Language (OWL) 5.4.2.Chemistry relevant ontologies 5.4.2.1. ChEBI 5.4.2.2. RSC Ontologies 5.4.2.3. 5.4.2.4. Chemical Information Ontology Others??? Module 10. Understanding and managing rights on digital information 1. How does IP protection works? 2. Types of licenses 2.1. Public domain 2.2. Open source 2.3. Creative Commons 3. When to cite and how to cite digital resources 3.1. Webs 3.2. Databases 3.3. Software 3.4. Multimedia objects