The Value of a High Quality Data Digital Library Ross Wilkinson Australian National Data Service Seoul, December, 2015 1 Outline The Value of Data The Value of a Data Library Trends The research data assets of Australia The challenges for Data Libraries The opportunities for Data Libraries Conclusions 2 A growing Seoul: What data is needed to research the best forms of growth for Seoul? Data will come from government, environmental monitors, public transport data, research into urban design… ….just as most cities in the world The data needs integration, protection, reliability The data will need to be accessed through a single location ….an urban digital library 3 What data environment is needed for: Understanding where and how to build in bushfire prone areas Understanding the largest living thing in Australia – the Great Barrier Reef The effective use of Australia’s soil ????? 4 Professor Peter Rathjen, VC of University of Tasmania: “Why should Universities care about research data?” Reputation is very important to research institutions. Libraries can make a substantial contribution to that reputation. Libraries are known for their collections, so creating world class data collections can help a library build an institution’s reputation. 5 What’s going on? Data is no longer a by-product of research Data is valuable Data practice is changing in many research disciplines Funders and Government want more from their research investments So do research institutional leadership 6 Data Value Stronger research More efficient research Stronger partnerships More industry engagment – data as a trust builder 7 The Value of Open Data Report The analysis in the report suggests that the value of data in Australia’s public research is at least $1.9 billion per annum and possibly up to $6 billion per annum – at 2012-13 levels of expenditure and activity. It is more valuable if it is available through appropriate research data infrastructure e.g. users of the British Atmospheric Data Centre report an average of 56% of their time working with data – that data is open and with appropriate tools. 8 9 What if we could transform research effort.. By dramatically reducing the cost of gathering and publishing?? 10 Some Trends: Reproducible Science Open Science Open Data Data Citation Data Citation Bibliometrics Data Journals Data Repositories Trusted Data Repositories FAIR Data Funded Fair Data 11 Australian Research Data Activity Data Policy Capturing data valuable over long periods in Marine, Astronomy, Earth Sciences, Ecosystems …for a wide range of research purposes Supporting the storage of data Supporting the management of data Supporting the enhancement of data Building Institutional Research Data Capacity 12 Research Data Policy ARC and NHMRC: Treat data as an asset Department of Environment: Requirement that data is open, discoverable, and available Department of Education: The Australian Research Data Infrastructure Strategy provides recommendations for coherent approach to research data and research data infrastructure 13 Integrated Marine Observing System IMOS is designed to be a fullyintegrated, national system, observing at ocean-basin and regional scales, and covering physical, chemical and biological variables. The IMOS Ocean Portal allows scientists to discover and explore data streams coming from the Facilities some in near-real time, and all as delayed-mode, quality-controlled data. These data streams, long timeseries that are 'under construction', represent the actual research infrastructure being created and developed by IMOS. 14 Data is Transformative Governments are not investing in research data to make life easier for researchers Investments in research data to enable societal problems to be addressed This requires data to be in a form that allows a wide variety of use 15 AURIN – Urban data infrastructure How can I increase the value of my suburban property development? How do I make it more “liveable” to attract more buyers? Integrate data from developers, local government, state government, federal government, mapping data, roads data, public transport maps…. Apply University of Melbourne developed 16 “walkability” index How do you develop suburbs that work for residents, developers and local government? Along the Maribyrnong River, 10 km from Melbourne’s CBD, 128 ha of government land is ripe for redevelopment It could accommodate 3000 dwellings and offices for 3000 people Planning a sustainable, liveable community integrated into its urban surrounds demands information on transport, health services, environment, housing prices, recreation facilities and more This comes from Federal and State government agencies, local councils, utilities and private companies For Maribyrnong, data and 80 tools to manage it are being made available through the Australian Urban Research Intelligence Network (AURIN) and the Australian National Data Service (ANDS) New tools—such as employment opportunities and walkability—are being added Similar projects can facilitate development across Australia’s cities and towns 17 Australian National Data Service: To make Australia’s research data assets more valuable for its researchers, research institutions and the nation 18 So we need to transform: Data that are: Unmanaged Disconnected Invisible Single use To Structured Collections that are: Managed Connected Findable Reusable Value so that researchers can easily publish, discover, access and use research data. Research Data Australia 20 What worked well: Getting going Establish a “voice for data” Coherence of research data infrastructure Coordination of policy and infrastructure Establishing research institutions at the centre of research data system Establishing a national system of infrastructure complementing institutional and thematic infrastructure Establishing international cooperation 21 Major Open Data Program Connecting mining data, to research techniques, to industry exploration Connecting twitter data to Jakarta map to analytics for managing flooding Collecting tropical data to institutional strategy Collecting ancient DNA for forming international partnerships for new results 22 Achievements to Date: Australian Research Data Commons established 100,000 data collections are described and discoverable ANDS has formed partnerships with most Australian universities and publicly funded research organisations Research Institutions have substantially greater research data management capacity than 5 years ago Research data is on the agenda of DVC’s-R Jointly Australia has world leading research data infrastructure Australia has a leading role in world research data infrastructure through the Research Data Alliance 23 Data Opportunities – and threats Data sharing is great for trust development Data openness challenges traditional business models Data partners can be anywhere – EU is investing €1.4B in open data to drive jobs and innovation 24 25 From G. Boulton Royal Society publishes “Science as an open enterprise” – written by Geoffrey Boulton Influential in EU/UK 26 FAIR Data – (FORCE 11) To be Findable: (meta)data are assigned a globally unique and eternally persistent identifier. data are described with rich metadata. (meta)data are registered or indexed in a searchable resource. metadata specify the data identifier. To be Accessible: (meta)data are retrievable by their identifier the protocol is open, free, and universally implementable the protocol allows for an authentication and authorization procedure, where necessary. metadata are accessible, even when the data are no longer available. To be Interoperable: (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation (meta)data use vocabularies that follow FAIR principles. (meta)data include qualified references to other (meta)data. To be Re-usable: meta(data) have a plurality of accurate and relevant attributes. (meta)data are released with a clear and accessible data usage license (meta)data are associated with their provenance. (meta)data meet domain-relevant community standards. 27 EU Open Data “Pilot” 1.4B Euros as part of H2020 80% take up 28 Data citation Data that is used should be cited – just as other work is cited Provides appropriate credit Enables reproduction DataCite provides reliability Agreed basic information: Creator (Publication year), Title, Publisher, Identifier Suitably formatted DOI 29 Data citation works with.. Connection is key ORCID – for people And the connections Crossref – for papers should be machine Fundref – for funders operable IGSN – for specimens Research is more … valuable if it is more Can we measure the connected value? Bibliometricians arise! 30 Data Journals Geoscience Data Journal (Wiley) Scientific Data (Nature) Journal of Open Archaeology Data (Ubiquity) Biodiversity Data Journal (Pensoft) A means of describing the data – its formation, properties, usage Enables recognition of a contribution Enhances usage of the data Enables “traditional” bibliometrics 31 So data is more valuable if: It supports Reproducible Science It supports Open Science Is Open Is Citable Is published Is reliably available Is available form a reliable digital library Is FAIR It reliably uses the data services that are discussed at ADLC 2015 32 Advertisement: Research Data Alliance - You may agree that data preservation is important - You may agree that international agreements are important - Using the Research Data Alliance working groups is a good way of getting wider agreement for issues that are important to you Data Libraries (repositories): Provide: Data storage Metadata storage Data access methods Data management software Data analysis services? Data processing services? But also: Integrated approach to content and metadata Policies, processes, services, and people Overall commitment to the stewardship of digital materials 34 Trusted data repositories (libraries) Need for reliable data Trusted repositories: Trusted Repositories Audit & Certification (TRAC) -ISO 16363 Data Seal of Approval e.g. Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) Often required by publishers May be increasingly required (and funded) by research funders 35 36 The Opportunity Fully integrated publication of all outputs of a scholarly endeavour with rich connection FAIR data in a trusted repository Fully explorable scholarly journals Researchers get much better exposure of their research The outcomes are defensible New research and partners become available 37 So that’s good… But a full function digital library has more to offer Where is the biggest saving in research? Where do the breakthroughs come from? 38 From a bioinformatician – Matjaz Hren Biggest waste of time in research are: Meetings – need ELN integration Data entry – need automated data and metadata capture tools Data search – need rich data catalogues 39 Dan Steinberg, Salford Systems In community of data miners and statistical modelers Most working at major corporations supporting extensive analytical projects Spend 80% of their effort in manipulating the data so that they can analyze it 40 Ashley Buckle, Protein Chrystalographer Required to prepare rich descriptions of data for associated publication Took he and a librarian a week of effort A tool that automated the capture of data from the synchrotron, migrated it, added metadata, added project information, added DOI Takes 15 minutes to prepare data 41 Long Term Ecological Research Network From the report at http://knowledgeinfrastructures.org: "Our call for methodological and collaborative innovation is best explained via an analogy in the natural sciences. Twenty years ago, the average ecologist worked on a patch of land no larger than a hectare, typically for a few months or a year, gathered data over a thirty-year career, published results, and then gradually lost the data. With the creation of the Long Term Ecological Research Network (LTER), the National Science Foundation began to change the nature of research. Today, at a number of sites nationally and in consonance with international projects, ecologists are able to look beyond the scale of a field and timeframe of a career: they now have the prospect of studying ecology and climate locally, nationally, globally, and over spans of time that more closely match those of ecological change. 42 So research is changing More, and more complex data Its getting harder to wade through it Yet insight is often connecting the pieces, seeing patterns, using new techniques …not being a poor information professional with home grown data and tools 43 A key role: A data library AND a data librarian can play a key role in reducing both the cost of data capture, gathering, preparation, as well as data publication Thus effort is transferred from researchers to information systems and information professionals ..to where it should be because it saves money, and adds reliability to research 44 What’s needed of a digital repository? You can find the data you’ve generated or need You can open the data you’ve generated or need You understand what the data is and what it’s about You can use or work with the data in the way you need You trust the data is what is says it is Managing Digital Continuity UK National Archives 2011 45 So we really can change the picture: Big data: Data size, complexity, reliability By dramatically reducing the cost of gathering and publishing, through reliable data libraries and librarians 46 Conclusions Research data is valuable It should be expected that the data underpinning findings are available for scrutiny Far greater value is available, especially if it is findable, accessible, interoperable and reusable This is helped if data is collected, used and published with reliable data libraries 47 Thank you! ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS). This work is licensed under a Creative Commons Attribution 3.0 Australia License 48