The SPECTRa Project : A wider chemistry picture Alan Tonge & Jim Downing A Digital Repository for the Chemical Community Project Overview • 18-month project between University of Cambridge and Imperial College London to develop customized tools to deposit chemistry data in digital repositories • Part of the JISC Digital Repositories programme • Closely integrated with eBank and eCrystals (Bath and Soton) Requirements determined by survey Requirements in a number of different user disciplines • synthetic organic chemistry • departmental crystallography services • computational chemistry determined by interview and survey The Problem Science depends upon data Experimental chemistry data is a resource / asset … • Proprietary spectra formats (NMR, IR, UV) : 5-year shelf life • PDF image files (supplementary data) : not machine readable • CIF xray : 90% remain unpublished …most of which is lost or becomes unreadable John Davies has 3000 unpublished structures. 3000 x £300 cost per structure ~= £1M Most of the problems are social, not technical The Solution Capture selected data from chemistry workflows in open format (JCAMP, MOL, CIF) Add context-specific metadata + Persistent identifiers Deposit in Digital Repository New feature (Controlled) public release Internet User search tools OAI-PMH Metadata Harvesting Computational Chemistry Calculations NMR Spectra 2D Chemical Structures SPECTRa Deposit Tools Create CML, InChI, metadata InChI : InChI=1/C8H8O/c1-7(9)8-5-3-2-4-6-8/h2-6H,1H3 DSpace Open CML : <molecule xmlns=“http://www.xml.cml.org/schema"> <atomArray> <atom id="a1" elementType="C" x2="-0.380600" y2="-0.720800"/> <atom id="a2" elementType="C" x2="-0.381800" y2="-1.548200"/> <atom id="a3" elementType="C" x2="0.333100" y2="-1.961000"/> <atom id="a4" elementType="C" x2="1.049500" y2="-1.547700"/> <atom id="a5" elementType="C" x2="1.046600" y2="-0.717200"/> <atom id="a6" elementType="C" x2="0.331300" y2="-0.308000"/> <atom id="a7" elementType="C" x2="1.759600" y2="-0.302000"/> <atom id="a8" elementType="C" x2="2.475600" y2="-0.711800"/> <atom id="a9" elementType="O" x2="1.756400" y2="0.523000"/> </atomArray> <bondArray> <bond atomRefs2="a4 a5" order="1"/> <bond atomRefs2="a2 a3" order="1"/> <bond atomRefs2="a5 a6" order="2"/> <bond atomRefs2="a6 a1" order="1"/> <bond atomRefs2="a1 a2" order="2"/> <bond atomRefs2="a5 a7" order="1"/> <bond atomRefs2="a3 a4" order="2"/> <bond atomRefs2="a7 a8" order="1"/> <bond atomRefs2="a7 a9" order="2"/> </bondArray> </molecule> SPECTRa Search Tools OAI-PMH Harvesting DSpace Escrow 3D X-ray Structures Acknowledgements • • • • • Project Director: Chemistry leads: Project Officers: Project Manager: Library Liaison: Peter Morgan UL Cambridge Henry Rzepa, Peter Murray-Rust Fiona Cotterill, Jim Downing Alan Tonge Janet Evans, Lorraine Windsor http://www.lib.cam.ac.uk/spectra/