Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012 Data and the Scientific Article Researchers perceive data sets as “important, but hard to access” Publishing Research Consortium, 2010 Researchers, N = 3824 Important, but hard to access Overview: Data & the Scientific Article • Current approaches • Thoughts for the future Supplementary Material • Authors can upload Supplementary Material with their paper Pro’s • Coupling of data and article • Peer review • Citation mechanism • Preservation (byte-wise) Con’s • Limited data type support • Compatibility (format support) • Limited capacity • Data not centrally stored Connecting with Data Repositories, 1 Article Linking example: CCDC Link to CCDC database (indicates that information for this article is available) Screenshot of journal article on ScienceDirect (http://dx.doi.org/10.1016/j.jfluchem.2009.07.015) Connecting with Data Repositories, 2 Article Linking example: CCDC ... clicking on the CCDC logo takes the reader to a page at the CCDC repository with data related to the article Screenshot of information page at CCDC (Cambridge Crystallographic Data Centre) Connecting with Data Repositories, 3 Entity Linking example: Genbank Accession Number Tagged Genbank entry (genetic sequence) Screenshot of journal article on ScienceDirect (http://dx.doi.org/10.1016/j.biortech.2010.03.063 ) Connecting with Data Repositories, 4 Entity Linking example: Genbank Accession Number ... clicking on the linked Genbank accession code takes the reader to an information page on the NCBI data repository about that specific genetic sequence Screenshot of information page at NCBI (National Center for Biotechnology Information) Connecting with Data Repositories, 5 Database Subject Type of Linking CCDC Crystallography Article-level PANGAEA Earth Sciences Article-level* EMBL Molecular Interactions Chemistry Entity, tagging Molecular INTeraction DB Chemistry Entity, tagging Genbank Nucleotides Entity, tagging UniProt Proteins Entity, tagging Protein Data Bank Proteins Entity, tagging ClinicalTrials Medicine Entity, tagging TAIR (Arabidopsis) Model organism Entity, tagging Mendelian Inheritance in Men Genetics, inheritance Entity, tagging *: with Application The Article of the Future Discovery and Use via SciVerse Applications Features & Benefits Use information from SciVerse and the web Support for rich user interfaces Integrated directly into the online article Simple to build using Content and Framework APIs Open standards (Apache Shindig, Open Social) Discovery and Use via SciVerse Applications Openness and Interoperability • Give me your data, my way… Personalization • Know who I am and what I want… Collaboration and trusted views • The right contacts, at the right time… “Apps interacting with results are very important to help save time…” Specific information can be targeted by applications to facilitate content mining and speed up the search time, utilising more time for analysis. Researchers can save time and improve their information discovery process “what faculty is really after is something that ties this altogether, so its all in one place…” Applications assist researchers to extract all information – content, data, figures etc. to a single analysis source which can be on a local database at the customer’s institute. Libraries can become focal point for applications Applications example: NCBI Genome Viewer Scans the article and builds list of sequences based on NCBI accession numbers tagged in the article View/analyze sequence data from genes in the article using NCBI Sequence Viewer See specific information about each strand; zoom in/out; export data Screenshots of journal article on ScienceDirect (http://dx.doi.org/10.1016/j.ygeno.2007.07.010) Applications example: PANGAEA Document identifier sent to PANGAEA data repository for earth sciences PANGAEA returns map plotted with locations where cited data was collected Push-pins open with details of dataset and direct link to data on PANGAEA.de Screenshots of journal article on ScienceDirect (http://dx.doi.org/10.1016/S0377-8398(01)00044-5) Elsevier Enables Content Mining Customers may: Perform extensive mining operations on subscribed content . Structuring input text Deriving patterns within the structured text Evaluation and interpretation of the output. Run extensive searches and use locally loaded content for text mining purposes for their own research. Extract semantic entities from Elsevier content for the purpose of recognition and classification of the relations between them CONTENT Enabling developers who wish to design and implement applications to analyse our content, or test applications as part of their research within Elsevier content Integrate results on a server used for the customer’s own mining system for access and use by its researchers through the customer’s internal secure network. Our Content Mining Solution Suite CONTENT DELIVERY SEARCH & WORKFLOW SOLUTIONS ANALYSIS Current initiative overview ◦ ◦ ◦ ◦ ◦ Supplementary Material Linking to Data Repositories Presentation via Article of the Future Discovery and Use via SciVerse Applications Empower scientists to mine content and use locally *************************** ◦ Data store (600 terrabytes as present) ◦ Executable papers ◦ Workflow tools ◦ Etc. Conclusions: some thoughts for the future FUNDERS PUBLISHERS RESEARCHERS INSTITUTIONS Need for aligned strategies and policies, sustainable business models, and concerted collaboration