Data Exchange, Quality Assurance and Integrated Data Publication

advertisement
The IUCr Diffraction Data Deposition Working Group
Activities since IUCr Madrid
John R. Helliwell, Tom Terwilliger and Brian McMahon
john.helliwell@manchester.ac.uk
terwilliger@lanl.gov
bm@iucr.org
Members of the IUCr Diffraction Data
Deposition Working Group
WG Members
Steve Androulakis (TARDIS representative)
Sol Gruner (Diffuse scattering specialist and SR Facility Director)
John R. Helliwell (Chair) (IUCr ICSTI Representative; Chairman,IUCr Journals Commission 1996-2005)
Loes Kroon-Batenburg (Data processing software)
Brian McMahon (IUCr CODATA Representative)
Tom Terwilliger (Representative of the Commission on Biological Macromolecules)
John Westbrook (wwPDB representative and COMCIFS)
Heinz-Josef Weyer (SR and Neutron Facility user)
By invitation
Chairs and delegates of IUCr Commissions
Consultants
Alun Ashton (Diamond Light Source (DLS); Data Archive leader there)
Herbert Bernstein (Head of the imgCIF Dictionary Maintenance Group and member of COMCIFS)
Frances Bernstein (Observer on data deposition policies)
Gerard Bricogne (Active software and methods developer)
Bernhard Rupp ( Macromolecular crystallographer)
Terms of reference
• It is becoming increasingly important to deposit the raw
data from scattering experiments;
• A lot of valuable information gets lost when only structure
factors are deposited.
• A number of research centres, e.g. synchrotron and
neutron facilities, are fully aware of the need and have
established detector working groups addressing this
issue.
Key terminology
• Raw data
• Processed data
• Derived data
The data publication pyramid
- the publishers' view
Reilly, S., Schallier, W., Schrimpf, S., Smit, E. & Wilkinson, W. (2011). Report on
Integration of Data and Publications.
Available from http://www.stm-assoc.org/integration-of-data-and-publications/
Data flow in crystallography
Raw experimental data
Reduced/processed data
Derived data
IUCr
journals
Experiment
(synchrotron or
laboratory)
Data
reduction
Chemistry
databases
(CCDC)
Structure solution
and refinement
(laboratory)
Other
journals
retained by scientist
archived at facility (~6 months)
deposited
published/disseminated
validated
Biological
structure
databases
(PDB)
Publication flow in IUCr journals
Experiment
(synchrotron or
laboratory)
Data
reduction
structure
factors
.fcf
Structure solution
CIF
file
.cif
Chemistry
databases
(CCDC)
and refinement
(laboratory)
.fcf
.cif
Author
Bibliographic
databases
(ISI, etc.)
raw data
(imgCIF)
IUCr journals
Validation
Peer
review
Technical
editing
.xml, .rdf etc.
Publication
.fcf
.cif
.pdf
.html
Raw experimental data
Reduced/processed data
Derived data
Published article
preprint
.pdf
.html
article of
record
.sgml
Cost and benefits analyses versus
obligations on researchers
• Raw data has been out of reach as a viable option to
archive but is now perceived within reach by a variety of
research fields, learned societies (e.g. ICSU, CODATA,
Royal Society etc.)
• IUCr Journals has thus far ‘encouraged’ authors to retain
raw data
• Funding agencies believe their PIs are carefully retaining
their ‘data’
Activities since Madrid
• Organization and Communications
– Core working group
– Consultation group (dddwg@iucr.org list)
– Public forum (http://forums.iucr.org)
– Planning for workshop/briefings at Regional
meetings
The IUCr DDD Forum
• http://forums.iucr.org/
• Has provided a focus for documents and some
discussion
• It has led to around 500 views for some documents
Activities since Madrid
• Organization and Communications
• Analysis and Discussion
– CCP4BB discussion: engagement and
summary
– ICSTI Insights articles
– Data comparison paper (Kroon-Batenburg et
al.)
– Survey of facilities by IUCr Commission on
Synchrotron Radiation
ICSTI Insights
1. The Living Publication has
existed for many years for
crystallographers
John R. Helliwell and Brian
McMahon
2. Continuous improvement of
macromolecular crystal
structures
Thomas C. Terwilliger
3. Should the crystallographic
community require the archiving
of raw diffraction data from a
crystal, a fibre or a solution?
John R. Helliwell and Brian
McMahon
Activities since Madrid
• Organization and Communications
• Analysis and Discussion
• Pilot Experiments
– University of Manchester repository
– Diamond data sets DOI registration
Activities since Madrid
•
•
•
•
Organization and Communications
Analysis and Discussion
Pilot Experiments
Ongoing efforts
– imgCIF/CBF development
– PaNdata
– MyTARDIS
imgCIF/CBF development
• Vendor support for imgCIF/CBF support (primarily Pilatus
6m, but also all major vendors)
• HDF5 backend for CBFlib (for seamless integration
between imgCIF/CBF and NeXus/HDF5)
• Metadata mapping between imgCIF and NeXus (H. J.
Bernstein, Diamond) – will require minimal changes to
imgCIF
• New compression methods in anticipation of Dectris
Eiger (2.5 gigapixels/second)
Aims of the DDD Workshop
• Review progress after 1 year (launch was August
2011)
• Review the choices for raw data archiving
• Proceed towards a policy and plan for raw data
archiving for approval by IUCr Executive Committee
in IUCr Congress 2014 in Montreal
Options
• Do nothing for ensuring raw data archiving;
• Do what we can e.g. via centralised facilities raw data
archiving along with Universities own data archives both
as supplements to the processed data archiving at the
CSD and PDB etc
• Seek a blue skies solution where all raw data are
compulsorily archived at centralised repositories.
Let's go to it!
Download