The IUCr Diffraction Data Deposition Working Group Activities since IUCr Madrid John R. Helliwell, Tom Terwilliger and Brian McMahon john.helliwell@manchester.ac.uk terwilliger@lanl.gov bm@iucr.org Members of the IUCr Diffraction Data Deposition Working Group WG Members Steve Androulakis (TARDIS representative) Sol Gruner (Diffuse scattering specialist and SR Facility Director) John R. Helliwell (Chair) (IUCr ICSTI Representative; Chairman,IUCr Journals Commission 1996-2005) Loes Kroon-Batenburg (Data processing software) Brian McMahon (IUCr CODATA Representative) Tom Terwilliger (Representative of the Commission on Biological Macromolecules) John Westbrook (wwPDB representative and COMCIFS) Heinz-Josef Weyer (SR and Neutron Facility user) By invitation Chairs and delegates of IUCr Commissions Consultants Alun Ashton (Diamond Light Source (DLS); Data Archive leader there) Herbert Bernstein (Head of the imgCIF Dictionary Maintenance Group and member of COMCIFS) Frances Bernstein (Observer on data deposition policies) Gerard Bricogne (Active software and methods developer) Bernhard Rupp ( Macromolecular crystallographer) Terms of reference • It is becoming increasingly important to deposit the raw data from scattering experiments; • A lot of valuable information gets lost when only structure factors are deposited. • A number of research centres, e.g. synchrotron and neutron facilities, are fully aware of the need and have established detector working groups addressing this issue. Key terminology • Raw data • Processed data • Derived data The data publication pyramid - the publishers' view Reilly, S., Schallier, W., Schrimpf, S., Smit, E. & Wilkinson, W. (2011). Report on Integration of Data and Publications. Available from http://www.stm-assoc.org/integration-of-data-and-publications/ Data flow in crystallography Raw experimental data Reduced/processed data Derived data IUCr journals Experiment (synchrotron or laboratory) Data reduction Chemistry databases (CCDC) Structure solution and refinement (laboratory) Other journals retained by scientist archived at facility (~6 months) deposited published/disseminated validated Biological structure databases (PDB) Publication flow in IUCr journals Experiment (synchrotron or laboratory) Data reduction structure factors .fcf Structure solution CIF file .cif Chemistry databases (CCDC) and refinement (laboratory) .fcf .cif Author Bibliographic databases (ISI, etc.) raw data (imgCIF) IUCr journals Validation Peer review Technical editing .xml, .rdf etc. Publication .fcf .cif .pdf .html Raw experimental data Reduced/processed data Derived data Published article preprint .pdf .html article of record .sgml Cost and benefits analyses versus obligations on researchers • Raw data has been out of reach as a viable option to archive but is now perceived within reach by a variety of research fields, learned societies (e.g. ICSU, CODATA, Royal Society etc.) • IUCr Journals has thus far ‘encouraged’ authors to retain raw data • Funding agencies believe their PIs are carefully retaining their ‘data’ Activities since Madrid • Organization and Communications – Core working group – Consultation group (dddwg@iucr.org list) – Public forum (http://forums.iucr.org) – Planning for workshop/briefings at Regional meetings The IUCr DDD Forum • http://forums.iucr.org/ • Has provided a focus for documents and some discussion • It has led to around 500 views for some documents Activities since Madrid • Organization and Communications • Analysis and Discussion – CCP4BB discussion: engagement and summary – ICSTI Insights articles – Data comparison paper (Kroon-Batenburg et al.) – Survey of facilities by IUCr Commission on Synchrotron Radiation ICSTI Insights 1. The Living Publication has existed for many years for crystallographers John R. Helliwell and Brian McMahon 2. Continuous improvement of macromolecular crystal structures Thomas C. Terwilliger 3. Should the crystallographic community require the archiving of raw diffraction data from a crystal, a fibre or a solution? John R. Helliwell and Brian McMahon Activities since Madrid • Organization and Communications • Analysis and Discussion • Pilot Experiments – University of Manchester repository – Diamond data sets DOI registration Activities since Madrid • • • • Organization and Communications Analysis and Discussion Pilot Experiments Ongoing efforts – imgCIF/CBF development – PaNdata – MyTARDIS imgCIF/CBF development • Vendor support for imgCIF/CBF support (primarily Pilatus 6m, but also all major vendors) • HDF5 backend for CBFlib (for seamless integration between imgCIF/CBF and NeXus/HDF5) • Metadata mapping between imgCIF and NeXus (H. J. Bernstein, Diamond) – will require minimal changes to imgCIF • New compression methods in anticipation of Dectris Eiger (2.5 gigapixels/second) Aims of the DDD Workshop • Review progress after 1 year (launch was August 2011) • Review the choices for raw data archiving • Proceed towards a policy and plan for raw data archiving for approval by IUCr Executive Committee in IUCr Congress 2014 in Montreal Options • Do nothing for ensuring raw data archiving; • Do what we can e.g. via centralised facilities raw data archiving along with Universities own data archives both as supplements to the processed data archiving at the CSD and PDB etc • Seek a blue skies solution where all raw data are compulsorily archived at centralised repositories. Let's go to it!