Moving Forward our Shared Data Agenda

advertisement
Moving forward our shared data agenda: a view from
the publishing industry
ICSTI, March 2012
Data and the Scientific Article
Researchers perceive data sets as “important, but hard to access”
Publishing Research Consortium, 2010
Researchers, N = 3824
Important, but hard to access
Overview: Data & the Scientific Article
• Current approaches
• Thoughts for the future
Supplementary Material
• Authors can upload Supplementary Material with their paper
Pro’s
• Coupling of data and article
• Peer review
• Citation mechanism
• Preservation (byte-wise)
Con’s
• Limited data type support
• Compatibility (format support)
• Limited capacity
• Data not centrally stored
Connecting with Data Repositories, 1

Article Linking example: CCDC
Link to CCDC
database
(indicates that
information for this
article is available)
Screenshot of journal article on ScienceDirect (http://dx.doi.org/10.1016/j.jfluchem.2009.07.015)
Connecting with Data Repositories, 2

Article Linking example: CCDC
... clicking on the CCDC logo takes the reader to a page at the CCDC repository with data
related to the article
Screenshot of information page at CCDC (Cambridge Crystallographic Data Centre)
Connecting with Data Repositories, 3

Entity Linking example: Genbank Accession Number
Tagged Genbank entry
(genetic sequence)
Screenshot of journal article on ScienceDirect (http://dx.doi.org/10.1016/j.biortech.2010.03.063 )
Connecting with Data Repositories, 4

Entity Linking example: Genbank Accession Number
... clicking on the linked Genbank accession code takes the reader to an information page
on the NCBI data repository about that specific genetic sequence
Screenshot of information page at NCBI (National Center for Biotechnology Information)
Connecting with Data Repositories, 5
Database
Subject
Type of Linking
CCDC
Crystallography
Article-level
PANGAEA
Earth Sciences
Article-level*
EMBL Molecular Interactions
Chemistry
Entity, tagging
Molecular INTeraction DB
Chemistry
Entity, tagging
Genbank
Nucleotides
Entity, tagging
UniProt
Proteins
Entity, tagging
Protein Data Bank
Proteins
Entity, tagging
ClinicalTrials
Medicine
Entity, tagging
TAIR (Arabidopsis)
Model organism
Entity, tagging
Mendelian Inheritance in Men
Genetics, inheritance
Entity, tagging
*: with Application
The Article of the Future
Discovery and Use via SciVerse Applications
Features & Benefits





Use information from
SciVerse and the web
Support for rich user
interfaces
Integrated directly into
the online article
Simple to build using
Content and Framework
APIs
Open standards
(Apache Shindig, Open
Social)
Discovery and Use via SciVerse Applications
Openness and
Interoperability
• Give me your data, my
way…
Personalization
• Know who I am and what I
want…
Collaboration
and trusted
views
• The right contacts, at the
right time…
“Apps interacting with results are very
important to help save time…”
Specific information can be targeted by
applications to facilitate content mining
and speed up the search time, utilising more
time for analysis.
Researchers can save time and improve their
information discovery process
“what faculty is really after is something
that ties this altogether, so its all in one
place…”
Applications assist researchers to extract all
information – content, data, figures etc. to a
single analysis source which can be on a
local database at the customer’s institute.
Libraries can become focal point for applications
Applications example: NCBI Genome Viewer



Scans the article and builds list of sequences based on NCBI accession numbers tagged in the article
View/analyze sequence data from genes in the article using NCBI Sequence Viewer
See specific information about each strand; zoom in/out; export data
Screenshots of journal article on ScienceDirect (http://dx.doi.org/10.1016/j.ygeno.2007.07.010)
Applications example: PANGAEA



Document identifier sent to PANGAEA data repository for earth sciences
PANGAEA returns map plotted with locations where cited data was collected
Push-pins open with details of dataset and direct link to data on PANGAEA.de
Screenshots of journal article on ScienceDirect (http://dx.doi.org/10.1016/S0377-8398(01)00044-5)
Elsevier Enables Content Mining
Customers may:
Perform extensive mining operations on
subscribed content .
 Structuring input text
 Deriving patterns within the structured text
 Evaluation and interpretation of the
output.
Run extensive searches and use
locally loaded content for text
mining purposes for their own
research.
Extract semantic entities from
Elsevier content for the purpose of
recognition and classification of the
relations between them
CONTENT
Enabling developers who wish to design and
implement applications to analyse our content,
or test applications as part of their research
within Elsevier content
Integrate results on a server used for the
customer’s own mining system for access and use
by its researchers through the customer’s internal
secure network.
Our Content Mining Solution Suite
CONTENT
DELIVERY
SEARCH &
WORKFLOW
SOLUTIONS
ANALYSIS
Current initiative overview
◦
◦
◦
◦
◦
Supplementary Material
Linking to Data Repositories
Presentation via Article of the Future
Discovery and Use via SciVerse Applications
Empower scientists to mine content and use locally
***************************
◦ Data store (600 terrabytes as present)
◦ Executable papers
◦ Workflow tools
◦ Etc.
Conclusions: some thoughts for the future
FUNDERS
PUBLISHERS
RESEARCHERS
INSTITUTIONS
Need for aligned strategies and policies, sustainable
business models, and concerted collaboration
Download