Josefine Nordling CSC – IT Center for Science LIBER 41st Annual Conference 27th of June 2012 Content Outline • • • • • • • • Introduction Stakeholder groups Objectives Phases of data re-use Work phases Key findings Data pyramids Final words Background • A FP7 project proposed by APA • 9 partners: European Organization for Nuclear Research (CERN, coordinators), Alliance for Permanent Access (APA), Helmholtz Association (HA), UK Science and Technology Funding Council (STFC), British Library (BL), Association of European Research Libraries (LIBER), German National Library (DNB), International Association of Scientific, Technical and Medical Pulishers (STM) & IT Center for Science (CSC) • Started 01/11/2010, ends 30/11/2012 (PM 1-25) Stakeholder Groups • 5 stakeholder groups: Libraries Data Centres Policy Makers & Funders Publishers Data Producers/Owners How stakeholders interact Research Institutes Researcher Publishers Libraries and Datacenters General objectives • Best practices in data sharing, re-use, preservation and citing • Emerging best practices & lessons learned, but also ”success stories”, ”near misses” & ”honourable failures” • Challenges, drivers, barriers & enablers Concrete objectives • Evidence gathering enabling/providing: Key players to compare visions and explore shared opportunities Different perspectives on data re-use Improved understanding of best practices within RDM – more coherent national policies and wider implementation of e-Infrastructure Information available for Horizon 2020 A vocabulary for data re-use Research Strategy Preservation Business Case Project Funding Preservation Planning Data collection/ simulation Prearchive phase Data Analysis Scientific Publication Data Preservation Social & Economic Impact Creation Discover data Access data Talking, listening, engaging, influencing • Communication with relevant stakeholder groups – visibility for ODE • Forum for all targeted audience – policy discussions & compare visions • Collaborations between projects – input and feedback • PR materials Data sharing today • Develop a broad understanding of the overall issues to be addressed by ODE • Identifying ”success stories”, ”near misses”, ”honourable failures ”, by conducting (21) interviews, including: Attitudes within different scientific communities on national and international level Researchers’ access to e-Infrastructures • Ten tales of drivers and barriers in data sharing Data enters scholarly communication • The impact of data sharing, re-use and preservation on scholarly communication • Publishers’ role: stricter editorial policies, enhancing articles, guidelines etc. • Integration of datasets and publications – libraries & data centres • Informal interviews (researchers, authors, editors, readers, data centres and libraries) & (110 responses) surveys (libraries) Drivers and barriers: questions and answers • Inform stakeholders of drivers and barriers on data sharing • Extension of use of data sharing beyond the Member States • Researcher’s benefits of data re-use – mapping the stakeholders willing to enable this • Revision of statements through consultation with experts (workshops, interviews, structured methods) • Identify a set of key findings The future of e-Infrastructures for data sharing • ”To demonstrate the value of information gathered and destil the results from the two conferences and the various areas investigated in previous work packages in order to ensure that each of the project’s target audiences can make informed decisions about the future of e-Infrastrucutres for data sharing and preservation.” The future of e-Infrstructures for data sharing (continue) • Categorisation of key findings - support eInfrastructure, describe possibilities and impact of data sharing, re-use and preservation • The roles of data in the future • Publications on the findings tailored to each stakeholder group – gathering together previous results • Still ahead: preparation of a thematic publication and a final report Challenges • Delivery of information on benefits of data • More training needed for researchers within RDM • More cross-cutting international discussions are needed • The costs of data availability and re-use covered, also after a project’s end • Confidential and sensitive data acquires specific access controls • The data deluge in itself Drivers • Increased impact if data is used and cited by other researchers • Publishers are developing collaborations with researchers and data centres • Data regeneration is far more expensive than data preservation • Many publishers support data hosting and data linking services • Re-use of data in meta-studies to find hidden trends • Authors are increasingly using publisher’s data services Barriers • Researcher’s hesitation to publish and share their data • Patenting issues • Lack of investment in libraries on supporting development within RDM • Publishing supplementary data alongside with articles is expensive • National reluctance in investing in global data infrastructures • Federal, national and institutional restrictions due to strategic interests Enablers • Citation and recognition frameworks • Clear instructions on data citation • Easy processes for submission of data – lowering the barriers for researchers • Join functions with scholarly communication • Working closely with researchers with encouraging motives • Engaging in establishing uniform data citation standards Enablers (continue) • Expert knowledge for setting grown rules for data re-use • Acting based on requirements of the research community • Preservation of data to ensure continued access to linked data • Support of crosslink between publications and datasets The Pyramid’s likely short term reality: (2) Risk that supplements to articles turn into Data Dumping places (4) Estimates are that at least 75 % of research data is never made openly avaiable Publ. with Data Processed & Represent. Data Data Archives (1) Top of the pyramid is stable but small (3) Too many disciplines lack a community endorsed data archive Data on Disks and in Drawers 20 The Ideal Pyramid Data (2) Only if data cannot be integrated in article, and only relevant extra explanations (4) More Data Journals that describe datasets, data mgt plans and data methods In (1) More integration of text and data, viewers and seamless links to interactive datasets Publications Article Supps (3) Seamless links (bidirectional) between publications and data, interactive viewers within the articles Data Archives Data on Disks and in Drawers 21 Lastly • Slowly moving in the right direction towards the ”best ways” of engaging in RDM • Emerging awareness throughout the community • Data centres, libraries and publishers are keen on developing their services • More and more collaborations are taking place • Next step: convincing the reserchers of the benefits of publishing, sharing and re-using data • http://www.ode-project.eu/ Thank You! Josefine Nordling Project Coordinator, CSC Josefine.Nordling@csc.fi