Accessing the data: going beyond what the author wanted to tell you Interactive Publications and the Record of Science ICSTI Winter Workshop Paris, Monday, February 8, 2010 Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU, UK bm@iucr.org PDFs and data impoverishment Henry Rzepa: Publishers are likely to love interactive PDF, since it is easy to archive. However ... such objects are data impoverished. Whereas with Jmol, one is obliged to provide semantically accurate data (e.g. CML or equivalent), the PDF object is simply a (pre)rendering of that data. Thus reconstituting a useful molecule from Jmol is trivial (and that reconstitution can then be used for many other purposes), reconstituting a molecule from a 3D PDF is likely to be non trivial, and will almost certainly suffer information loss compared to the original data. By all means, provide both, but I strongly urge that a 3D PDF should not be the only object provided. http://www.mail-archive.com/jmol-users@lists.sourceforge.net/msg13417.html 19 December 2009: Jmol interactive visualizations • Not new Biochem J. (2008). 412 399–413 • Bespoke design / implementation • Expensive • Requires consultation • Supplementary information The right tool for the job Jmol Then (ca. 2004): • Protein structures (RasMol) • Small organic chemical molecules (Chime) Now: • Crystal lattices (symmetry) • Inorganic materials (coordination polyhedra) • Displacement ellipsoids • Symmetry operations • Electron orbitals • Electron density maps Making it easier to use • Editing toolkit http://submission.iucr.org/jtkt • High-quality immediate visual feedback • Context-sensitive help • Manuals, examples, tutorials • Reference: McMahon, B. & Hanson, R.M. (2008). J. Appl. Cryst. 41, 811-814. A toolkit for publishing enhanced figures Interactive molecular visualizations enhance understanding Acta Cryst. (2008). F64, 156-162 • • • • • Rotate Modify orientation Alternative representations Overlay representations Interrogate Infrastructure for publication workflow • • • • • Server/client architecture Ability to create interactive figures before or during article submission/review Opportunity for peer review/revision Auto-generation of static equivalent Easy generation/activation of multiple scripts to provide alternative views Requirements for routine publication of enhanced figures • Platform independence • Web access for authors • Serving visualization application and data • Integration into submission/review procedures • Integration into journal production workflow • Automated generation of static copy (for failsafe/PDF edition/archiving) • Authoring tools The authoring environment • The author uploads a data file (CIF) • The system provides different default styles according to the type of structure • The author edits and annotates the view • The author may supply additional scripts • The author saves the result as an enhanced figure + publication-quality static figure Saving the enhanced figure • Interactive applet • Active scripts provided by the author • High-resolution static image • Option to view dynamic or static image online • Link to allow peer review The toolkit editing interface • Essential tool for authors • Accommodates novice and advanced users • Tabbed interface allows authors to concentrate on scientific aspects of visualization • Presets tuned to journal style requirements • Live testing, preview and feedback mechanisms Submission/review • Author may prepare enhanced figure ahead of publication • Simply enter URL of edit workspace when asked to ‘upload source files’ • Presented alongside other conventional figures • Available for peer review • Can be edited in response to referee comments Interactive authorship: publBio http://publbio.iucr.org • Start with the data (PDB) example 3jw1 • Add structured text • Online look-up: • authors • references • crystallization solution components • Validation • references • Visualisation (Jmol) • Update data file as submission vehicle Uniform (compatible) markup systems • Crystallographic Information Framework (CIF) • Treat data/metadata, text/numerical data as peers • Domain-specific extensions (dictionaries = ontologies) • Image format • Some data fields may need to contain richer content • Text markup • Mathematical equations • Interactive figure scripts • Machine validation of dictionary attributes • Methods Conclusions • • • • • • • The working scientist really wants to interact with the data What interactive PDF offers is currently limited Publishers should develop compatible architectures Need domain-specific implementations (learned societies) Investment in new applications; integration with workflow Education for a new paradigm Archiving • • • • requires more standardisation proper compound document model concentrate on data (or semantic content), not the implementation ‘record not what it looks like, but what you are looking at’ • Distributed content sources • data not necessarily integral part of document • retrieval of non-discrete data sets