conf-OCDE-melero.ppt

advertisement
TIP Thematic Workshop on Open Science
and Open Data
OCDE Conference Centre, Paris
12 December 2013
Remedios Melero. Email: rmelero@iata.csic.es
Spanish National Research Council (CSIC)
A Conversation With BioMed Central’s Cockerill on Open Access Publishing
by Abby Clobridge. Published in Information Today, Inc. , posted 12 November
2013
Available at:http://newsbreaks.infotoday.com/nbreader.asp?ArticleId=93196&PageNum=2
Cockerill will be staying on with BMC through the end of the year and isn’t yet
talking in specifics about what’s next. Even so, Cockerill was quite excited
when considering the future:
“Big data is of course the big buzzword right now. The biomedical area is
producing petabytes of data. There’s lots of technology being thrown at it, but
there are still lots of silos of information which need to be better
integrated to advance scientific knowledge and improve healthcare. We
need to bring it all together, so data can be combined, visualized, reanalyzed and interpreted, to drive advances in knowledge and to deliver
better therapies. It’s a fascinating space which is full of opportunities
and I’m looking forward to again being involved in a field at an early
stage of development. There’s something challenging but rewarding about
that.”
“…Open access to scientific results and data is a great way to boost
science, boost the economy, and enable new techniques and
collaborations between disciplines. Really it's quite simple: it's about
ensuring you can see the results you've already paid for through your
taxes….”
Open data is data that meets the criteria of intelligent
openness. Data must be accessible, useable,
assessable and intelligible ( extracted from Science as
an Open Enterprise, 2012 )
Accessible
Data must be located in such a manner that it can readily be found and in a form that
can be used.
Useable
In a format where others can use the data or information. Data should be able to be
reused, often for different purposes, and therefore will require proper background
information and metadata.
Assessable
In a state in which judgments can be made as to the data or information’s reliability.
Intelligible
Comprehensive for those who wish to scrutinise something.
eScience features
Identified
(persistent and
unique identifier)
Explanatory
metadata
Accesible
Usable
Re-usable
Preserved
+++++
http://commons.wikimedia.org/wiki/File:LOD_Cloud_Diagram_as_of_September_2011.png
New models of e-journals with datasets
The Journal publishes peer
reviewed data papers describing
public health datasets with high
reuse potential
Ubiquity Press Metajournals
“If there is a suitable subject repository for the data files, please
deposit them there and then include the Accession Number(s) or other
Identifiers and database details in your article. For some data types such as
genetic sequences and protein structures, it is essential that the data
are deposited in GenBank and Protein Data Bank, respectively. For Xray crystal structures, please also submit your validation reports.
For all other data, please let us know the file types you have and the
approximate total size of your datasets and then we will arrange with you
the best way to transfer the data to us. We will then review it and then
deposit it on your behalf in a stable data repository” (from the author’s
guidelines)
GigaBD contains
datasets and
assigns DOIs
From a Sample Data Descriptor…
New journal published by Nature Pub
Group (video) to be launched in
Spring 2014
http://www.nature.com/scientificdata/
Deposited at..
Cited in the
reference list
Open data for other uses
Some cases…….
The DPLA is a platform that enables new and transformative uses of our
digitized cultural heritage. The DPLA's application programming interface
(API) and open data can be used by software developers, researchers, and
others to create novel environments for learning, tools for discovery, and
engaging apps.
http://dp.la/
The Renewable Energy and Energy Efficiency Partnership
(REEEP) is a Public-Private Partnership launched at the
Johannesburg World Summit in 2002. http://data.reegle.info/
Since 2002, the World Bank has collected this data from face-to-face interviews
with top managers and business owners in over 130,000 companies in 135
economies. http://www.enterprisesurveys.org/
The Global Open Data for Agriculture and Nutrition (GODAN) initiative seeks to
support global efforts to make agricultural and nutritionally relevant data
available, accessible, and usable for unrestricted use worldwide. Launched in
October 2013. http://godan.info/
The SGC (Structural Genomics Consortium) is a not-for-profit, publicprivate partnership with the directive to carry out basic science of
relevance to drug discovery. http://www.thesgc.org/
The Research Data Alliance (RDA) http://rd-alliance.org/
The Research Data Alliance implements the technology, practice, and
connections that make Data Work across barriers.
Funders:
Australian National Data Service
The European Commission through the iCordi project 7th FP
National Science Foundation
http://recodeproject.eu/
The Policy RECommendations for Open Access to Research Data in Europe
(RECODE) project will leverage existing networks, communities and projects
to address challenges within the open access and data dissemination and
preservation sector and produce policy recommendations for open access to
research data based on existing good practice.
Data Citacion
Data Citation
Cycle
Identification of datasets favours their use and citation
Australian National Data Service. http://www.ands.org.au/cite-data/index.html
Ver Piwowar et al. (2013) Data reuse and the open data citation advantage.
PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1
Papers studies that created gene expression microarray data and made them
available GEO data (Gene Expression Omnibus) received more citations than those
for which data were not available
Bertil Dorch, (2012) On the Citation Advantage of linking to data.
http://hprints.org/hprints-00714715
Papers published in The Astrophysical Journal from 2000 to 2010 with links
to data archived in ADS (Astrophysical Data System)
Papers with links
to data receiving
on the average
50% more
citations per
paper per year,
than the papers
without links to
data
Papers published between 1993 y 2010 in journal Paleoceanography with links
to data archived in PANGAEA®
Publicly available
data were thus
significantly
associated with
about 35% more
citations per article
than the average of
all articles sampled
over the 18-year
study period, and
the increase is fairly
consistent over time
(14 of 18 years).
http://www.komfor.net/blog/unbenanntemitteilung
HOWs and WHYs to support Open Research Data
Science as an Open Enterprise. The Royal Society Science Policy Centre report
02/12. Avaliable at http://royalsociety.org/policy/projects/science-public-enterprise/report/
The Denton Declaration: An Open Access Data Manifesto. A product of the
3rd Annual University of North Texas Symposium on Open Access, 2012.
Principles
http://openaccess.unt.edu/denton-declaration
LERU (Liegue of European Research Universities ) statements on Open
Access and Open data
What can universities do?
• Implement data management policies
• Create and support technical infrastruture
• Advocacy programmes (how researchers should manage their data)
• Work togeher with funders to share infrastructure and best practices
The value of Research data. Metrics for datasets from a cultural and technical
point of view. http://www.knowledge-exchange.info/datametrics
Recommendations targeted at the most important stakeholders involved in the
promotion and generation of data sharing
Funders
• Demand and reward data
sharing activities
• Consider data metrics in
assessments
• Inform about the importance and
benefits of data sharing
• Promote open access of data
Research Institutions
• Promote policies of data sharing
• Promote arguments and incentives
in favour of data sharing
• Provide options and alternatives to
the different types of data sharing
activities
• Professionalize staff and
standardize data sharing activities
(collection, curation, dissemination)
Scientists
Libraries
• Include data sharing as good scientific
and scholarly practice
• Promote data citation as the formal
way of acknowledging data sharing
• Perform more research on benefits
and possibilities of data sharing
• Define codes of conducts for
disciplines considering appropriate
regulations, i.e. embargo periods,
anonymisation etc.
• Promote data publications and data
citations
• Coach scholars and research
managers in their data publication and
citation activities
• Inform authors about other data
sharing stakeholders (e.g. funders,
repositories, data centres)
• Develop tools to find data repositories
• Develop and test appropriate metrics
Publication databases
• Collect and measure data publications and data citations
• Facilitate the analysis and metrics of data publications and data citations
Data centres
• Inform the scientific community about
data activities and services
• Contribute to reduce the dispersion
of data repositories
• Develop robust solutions for the
preservation and standardisation of
the data storage and citations
• Develop tools for tracking the users
of the repositories
Publishers
• Promote data sharing in their
publications and journals
• Inform authors about other data
sharing stakeholders (e.g. repositories,
data centres)
• Support open access to data
Data can also
generate new
jobs
Thank you!
Merçi!
Reme Melero
rmelero@iata.csic.es
Annex
Where to find datasets and data repositories?
http://www.datacite.org
Example
Results
Filters
Databib. Catalogue, directory and registry of data repositories.
http://databib.org/
Example
Directory of data repositories
http://www.re3data.org/
Case: DRYAD
http://www.ands.org.au/
Download