TIP Thematic Workshop on Open Science and Open Data OCDE Conference Centre, Paris 12 December 2013 Remedios Melero. Email: rmelero@iata.csic.es Spanish National Research Council (CSIC) A Conversation With BioMed Central’s Cockerill on Open Access Publishing by Abby Clobridge. Published in Information Today, Inc. , posted 12 November 2013 Available at:http://newsbreaks.infotoday.com/nbreader.asp?ArticleId=93196&PageNum=2 Cockerill will be staying on with BMC through the end of the year and isn’t yet talking in specifics about what’s next. Even so, Cockerill was quite excited when considering the future: “Big data is of course the big buzzword right now. The biomedical area is producing petabytes of data. There’s lots of technology being thrown at it, but there are still lots of silos of information which need to be better integrated to advance scientific knowledge and improve healthcare. We need to bring it all together, so data can be combined, visualized, reanalyzed and interpreted, to drive advances in knowledge and to deliver better therapies. It’s a fascinating space which is full of opportunities and I’m looking forward to again being involved in a field at an early stage of development. There’s something challenging but rewarding about that.” “…Open access to scientific results and data is a great way to boost science, boost the economy, and enable new techniques and collaborations between disciplines. Really it's quite simple: it's about ensuring you can see the results you've already paid for through your taxes….” Open data is data that meets the criteria of intelligent openness. Data must be accessible, useable, assessable and intelligible ( extracted from Science as an Open Enterprise, 2012 ) Accessible Data must be located in such a manner that it can readily be found and in a form that can be used. Useable In a format where others can use the data or information. Data should be able to be reused, often for different purposes, and therefore will require proper background information and metadata. Assessable In a state in which judgments can be made as to the data or information’s reliability. Intelligible Comprehensive for those who wish to scrutinise something. eScience features Identified (persistent and unique identifier) Explanatory metadata Accesible Usable Re-usable Preserved +++++ http://commons.wikimedia.org/wiki/File:LOD_Cloud_Diagram_as_of_September_2011.png New models of e-journals with datasets The Journal publishes peer reviewed data papers describing public health datasets with high reuse potential Ubiquity Press Metajournals “If there is a suitable subject repository for the data files, please deposit them there and then include the Accession Number(s) or other Identifiers and database details in your article. For some data types such as genetic sequences and protein structures, it is essential that the data are deposited in GenBank and Protein Data Bank, respectively. For Xray crystal structures, please also submit your validation reports. For all other data, please let us know the file types you have and the approximate total size of your datasets and then we will arrange with you the best way to transfer the data to us. We will then review it and then deposit it on your behalf in a stable data repository” (from the author’s guidelines) GigaBD contains datasets and assigns DOIs From a Sample Data Descriptor… New journal published by Nature Pub Group (video) to be launched in Spring 2014 http://www.nature.com/scientificdata/ Deposited at.. Cited in the reference list Open data for other uses Some cases……. The DPLA is a platform that enables new and transformative uses of our digitized cultural heritage. The DPLA's application programming interface (API) and open data can be used by software developers, researchers, and others to create novel environments for learning, tools for discovery, and engaging apps. http://dp.la/ The Renewable Energy and Energy Efficiency Partnership (REEEP) is a Public-Private Partnership launched at the Johannesburg World Summit in 2002. http://data.reegle.info/ Since 2002, the World Bank has collected this data from face-to-face interviews with top managers and business owners in over 130,000 companies in 135 economies. http://www.enterprisesurveys.org/ The Global Open Data for Agriculture and Nutrition (GODAN) initiative seeks to support global efforts to make agricultural and nutritionally relevant data available, accessible, and usable for unrestricted use worldwide. Launched in October 2013. http://godan.info/ The SGC (Structural Genomics Consortium) is a not-for-profit, publicprivate partnership with the directive to carry out basic science of relevance to drug discovery. http://www.thesgc.org/ The Research Data Alliance (RDA) http://rd-alliance.org/ The Research Data Alliance implements the technology, practice, and connections that make Data Work across barriers. Funders: Australian National Data Service The European Commission through the iCordi project 7th FP National Science Foundation http://recodeproject.eu/ The Policy RECommendations for Open Access to Research Data in Europe (RECODE) project will leverage existing networks, communities and projects to address challenges within the open access and data dissemination and preservation sector and produce policy recommendations for open access to research data based on existing good practice. Data Citacion Data Citation Cycle Identification of datasets favours their use and citation Australian National Data Service. http://www.ands.org.au/cite-data/index.html Ver Piwowar et al. (2013) Data reuse and the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1 Papers studies that created gene expression microarray data and made them available GEO data (Gene Expression Omnibus) received more citations than those for which data were not available Bertil Dorch, (2012) On the Citation Advantage of linking to data. http://hprints.org/hprints-00714715 Papers published in The Astrophysical Journal from 2000 to 2010 with links to data archived in ADS (Astrophysical Data System) Papers with links to data receiving on the average 50% more citations per paper per year, than the papers without links to data Papers published between 1993 y 2010 in journal Paleoceanography with links to data archived in PANGAEA® Publicly available data were thus significantly associated with about 35% more citations per article than the average of all articles sampled over the 18-year study period, and the increase is fairly consistent over time (14 of 18 years). http://www.komfor.net/blog/unbenanntemitteilung HOWs and WHYs to support Open Research Data Science as an Open Enterprise. The Royal Society Science Policy Centre report 02/12. Avaliable at http://royalsociety.org/policy/projects/science-public-enterprise/report/ The Denton Declaration: An Open Access Data Manifesto. A product of the 3rd Annual University of North Texas Symposium on Open Access, 2012. Principles http://openaccess.unt.edu/denton-declaration LERU (Liegue of European Research Universities ) statements on Open Access and Open data What can universities do? • Implement data management policies • Create and support technical infrastruture • Advocacy programmes (how researchers should manage their data) • Work togeher with funders to share infrastructure and best practices The value of Research data. Metrics for datasets from a cultural and technical point of view. http://www.knowledge-exchange.info/datametrics Recommendations targeted at the most important stakeholders involved in the promotion and generation of data sharing Funders • Demand and reward data sharing activities • Consider data metrics in assessments • Inform about the importance and benefits of data sharing • Promote open access of data Research Institutions • Promote policies of data sharing • Promote arguments and incentives in favour of data sharing • Provide options and alternatives to the different types of data sharing activities • Professionalize staff and standardize data sharing activities (collection, curation, dissemination) Scientists Libraries • Include data sharing as good scientific and scholarly practice • Promote data citation as the formal way of acknowledging data sharing • Perform more research on benefits and possibilities of data sharing • Define codes of conducts for disciplines considering appropriate regulations, i.e. embargo periods, anonymisation etc. • Promote data publications and data citations • Coach scholars and research managers in their data publication and citation activities • Inform authors about other data sharing stakeholders (e.g. funders, repositories, data centres) • Develop tools to find data repositories • Develop and test appropriate metrics Publication databases • Collect and measure data publications and data citations • Facilitate the analysis and metrics of data publications and data citations Data centres • Inform the scientific community about data activities and services • Contribute to reduce the dispersion of data repositories • Develop robust solutions for the preservation and standardisation of the data storage and citations • Develop tools for tracking the users of the repositories Publishers • Promote data sharing in their publications and journals • Inform authors about other data sharing stakeholders (e.g. repositories, data centres) • Support open access to data Data can also generate new jobs Thank you! Merçi! Reme Melero rmelero@iata.csic.es Annex Where to find datasets and data repositories? http://www.datacite.org Example Results Filters Databib. Catalogue, directory and registry of data repositories. http://databib.org/ Example Directory of data repositories http://www.re3data.org/ Case: DRYAD http://www.ands.org.au/