Expected previous experience

advertisement
Cheminformatics OLCC
Draft syllabus – Jordi Cuadros (20141213), based on Belford, “R. Cheminformatics
OLCC Topics 3”
Intended audience
Undergraduate chemistry students (probably not freshman students)
Expected previous experience
Some use of common desktop applications and web services at standard user level.
A semester of organic chemistry or similar and some chemistry lab experience.
Objectives of the course
At the end of the course, the students:
 will know and be able to use the most common formats which are used to store,
transform and manage chemical information in digital environments;
 will have a basic knowledge of the most common software tools and webservices (at
least, those easily available) that current chemists and chemistry researchers use.
The course will not use/require any computational skill (no programming) nor will make use of
any chemistry computational tools. It is intended to address what any chemist or chemistry
student should know to use current digital chemistry information resources.
Syllabus
Module 1. Cheminformatics: history and goals
This first module of the course should provide an introduction to the topic of cheminformatics,
including a brief historical introduction and an overview of what will be in the course.
1. Brief historical introduction to cheminformatics
2. Major topics on cheminformatics
2.1. Chemical entities and chemical data
2.2. Chemistry information resources
Module 2. Chemists using common desktop applications
This module should provide some background information on file formats and enter into some
parts of desktop applications that chemists may use more extensive than other users.
Common plugins/add-ons to desktop applications for chemistry should come here.
1. Understanding common files and formats
1.1. Saving information in files
1.1.1.Text files versus binary files
1.1.2.Open versus proprietary format
1.2. Desktop application files
1.2.1.Word processor
1.2.2.Worksheet applications
1.2.3.Presentation applications
1.3. Graphical information
1.3.1.Bitmap files
1.3.1.1.
Compressed formats
1.3.2.Vector graphics
1.3.3.Animations
1.3.4.3D formats
1.4. Web files
2. Chemists use of desktop application
2.1. Making charts
2.2. Adding symbols and equations
2.3. Managing references
2.4. Chemistry add-ons
2.4.1.Chem4Word (http://chem4word.codeplex.com/releases/view/132214)
2.4.2.Chemistry Formatter (http://christopherking.name/ChemFormat/index.html)
2.4.3.JChem for Office (http://www.chemaxon.com/products/jchem-for-office/)
2.4.4.Dotmatics for Office (http://www.dotmatics.com/products/dotmatics-foroffice/)
2.4.5.Insight for Excel (http://accelrys.com/micro/insight/insight-for-excel.html)
2.4.6.TouchMol for Office (http://www.scilligence.com/web/touchmol.aspx)
2.4.7.Others???
Module 3. Identifying chemical entities
I think that reference must be made to drawing conventions and how they dramatically
influence computer representation of the compound. For example, IUPAC drawing standards
for compounds (Brecher’s document representing conventions) plus stereoconfiguration
representation. Fischer representations, sugar depictions, Mol 2000 vs mol 3000 limitations
1. Chemical entity identifiers
1.1. Names – IUPAC, CAS, Beilstein, Variability of systematic names based on settings, PIN
names
1.2. Formulas
1.2.1.Markush structures
1.3. Line notations
1.3.1.SMILES and related notation
1.3.1.1.
SMILES
1.3.1.2.
SSMILES
1.3.1.3.
Canonical versions of SMILES – variability of canonicalizers
1.3.1.4.
SMARTS
1.3.1.5.
SLN
1.3.2.International Chemical Identifier (InChI)- – limitations, layer definitions, stereo
communication,
1.3.3.Other: ROSDAL, WLN…
1.4. Non-readable identifiers
1.4.1.Registry numbers
1.4.1.1.
CAS RN - check digit
1.4.1.2.
CSID
1.4.1.3.
PubChemID
1.4.1.4.
Others??? Beilstein IDs
1.4.2.InChIKey should be connected to InChI directly not separated. How it can be used
on databases.
1.5. Others???
Module 4. Presenting chemistry in 2D
1. Using molecular editors
1.1. ACD/ChemSketch
1.2. Accelrys Draw
1.3. ChemDraw
1.4. Others: JChemPaint, JSME, SketchEI, ChemDoodle…
2. Common file formats for 2D representation of chemical entities
2.1. MDL Mol file 2000 vs 3000
2.2. CML
2.3. Others??? OpenBabel
References
 http://www.gunda.hu/dprogs/
Module 5. Presenting chemistry in 3D
1. Using visualization tools for chemical entities
1.1. Jmol and JSmol
1.2. VMD
1.3. Maestro
1.4. OpenStructure
1.5. Chimera
1.6. Rasmol
1.7. Other???
2. Common file formats for 3D representation of chemical entities
2.1. MDL Mol
2.2. CML
2.3. PDB
2.4. XYZ
2.5. Others???
References
 http://mariovalle.name/ChemViz/tools.html
 http://en.wikipedia.org/wiki/List_of_molecular_graphics_systems
Module 6. Comparing and searching chemical entities
1. Using chemical entities databases
1.1. ChemSpider
1.2. PubChem
1.3. NIST Chemistry WebBook
1.4. ZINC
1.5. ChemExper
1.6. ChemIDPlus
1.7. ChEBI
1.8. Others???
2. Understanding chemical searches
2.1. Exact search
2.2. Substructure search
2.3. Similarity search
2.3.1.Fingerprints
2.3.2.Distance measurements
2.4. Virtual screening
3. Using the databases programmatically
3.1. APIs
3.2. Web scraping
References
 http://depth-first.com/articles/2011/10/12/sixty-four-free-chemistry-databases/
 http://www.dmoz.org/Science/Chemistry/Chemical_Databases/
 http://cds.rsc.org/externalresources.asp
Module 7. Representing and searching chemical reactions
1. Using chemical reaction databases
1.1. WebReactions
1.2. CASREACT
1.3. Rhea
1.4. Others??? SPRESI, Methods of Organic Synthesis, Reaxys
2. CAOS applications?
References:
 http://www.organicworldwide.net/content/reaction-databases
Module 8. Representing and managing digital spectra
1. Applications to PROCESS and review spectral data – ACD/NMR Processor, MestreLabs MNova
2. Using applications to visualize and analyze spectral data
2.1. JSpecView as Java Applet plus the Javascript version
2.2. ChemDoodle Spectral viewer
2.3. JDXView
2.4. ACD/Labs spectral viewer
2.5. Others???
3. Spectra databases
3.1. SDBS
3.2. Learn Chemistry Spectraschool
3.3. ChemSpider
3.4. NMRDB
3.5. NIST Chemistry Webbook
3.6. Others???
4. Common file formats for chemical spectra – binary vs standard file formats – loss of
information – phasing, referencing, analysis etc.
4.1. JCAMP-DX
4.2. aniML
4.3. CML
4.4. Others??? MZML for MassSpec, SPC for IR – complexities of multi-dimensional data
handling, assigning data for structure-spectra correlations
Module 9. Identifying chemistry information resources
1. Documentation identifiers
1.1. ISBN, ISSN, ISAN
1.2. DOI
1.3. URI
2. Author’s identifiers
2.1. ORCID
2.2. ResearcherID
2.3. Others???
3. Managing resources with applications and webservices
3.1. Mendeley
3.2. Zotero
3.3. EndNote
3.4. Refworks
3.5. Others???
4. Searching chemical information
4.1. SciFinder
4.2. ChemSPider
4.3. PubMed / PubChem
4.4. Web of Science
4.5. Scopus
4.6. Google Scholar
4.7. Others???
5. Classification of chemical information and the semantic web
5.1. What is the semantic web?
5.2. Keywords and metadata
5.2.1.Metadata formats
5.2.1.1.
Dublin core
5.2.1.2.
Others???
5.2.2.How to add metadata to digital resources
5.3. Thesauri and other controlled vocabularies
5.4. Ontologies
5.4.1.Web Ontology Language (OWL)
5.4.2.Chemistry relevant ontologies
5.4.2.1.
ChEBI
5.4.2.2.
RSC Ontologies
5.4.2.3.
5.4.2.4.
Chemical Information Ontology
Others???
Module 10. Understanding and managing rights on digital information
1. How does IP protection works?
2. Types of licenses
2.1. Public domain
2.2. Open source
2.3. Creative Commons
3. When to cite and how to cite digital resources
3.1. Webs
3.2. Databases
3.3. Software
3.4. Multimedia objects
Download