Talk PPT - icadl 2015

advertisement
The Value of a High Quality Data
Digital Library
Ross Wilkinson
Australian National Data Service
Seoul, December, 2015
1
Outline







The Value of Data
The Value of a Data Library
Trends
The research data assets of Australia
The challenges for Data Libraries
The opportunities for Data Libraries
Conclusions
2
A growing Seoul:
 What data is needed to
research the best forms
of growth for Seoul?
 Data will come from
government,
environmental
monitors, public
transport data, research
into urban design…
 ….just as most cities in
the world
 The data needs
integration,
protection, reliability
 The data will need to
be accessed through a
single location
….an urban digital library
3
What data environment is needed for:
 Understanding where and how
to build in bushfire prone areas
 Understanding the largest living
thing in Australia – the Great
Barrier Reef
 The effective use of Australia’s
soil
?????
4
Professor Peter Rathjen, VC of University of Tasmania:
“Why should Universities care about research data?”
Reputation is very important to research institutions.
Libraries can make a substantial contribution to that
reputation.
Libraries are known for their collections, so creating
world class data collections can help a library build an
institution’s reputation.
5
What’s going on?
 Data is no longer a by-product of research
 Data is valuable
 Data practice is changing in many research
disciplines
 Funders and Government want more from their
research investments
 So do research institutional leadership
6
Data Value




Stronger research
More efficient research
Stronger partnerships
More industry engagment – data as a trust builder
7
The Value of Open Data Report
The analysis in the report suggests that the value of data
in Australia’s public research is at least $1.9 billion per
annum and possibly up to $6 billion per annum – at
2012-13 levels of expenditure and activity.
It is more valuable if it is available through appropriate
research data infrastructure
e.g. users of the British Atmospheric Data Centre report
an average of 56% of their time working with data – that
data is open and with appropriate tools.
8
9
What if we could transform research effort..
By dramatically reducing the cost of gathering and publishing??
10
Some Trends:





Reproducible Science
Open Science
Open Data
Data Citation
Data Citation
Bibliometrics
 Data Journals
 Data Repositories
 Trusted Data
Repositories
 FAIR Data
 Funded Fair Data
11
Australian Research Data Activity
 Data Policy
 Capturing data valuable over long periods in
Marine, Astronomy, Earth Sciences, Ecosystems
…for a wide range of research purposes
 Supporting the storage of data
 Supporting the management of data
 Supporting the enhancement of data
 Building Institutional Research Data Capacity
12
Research Data Policy
 ARC and NHMRC: Treat data as an asset
 Department of Environment: Requirement that
data is open, discoverable, and available
 Department of Education: The Australian Research
Data Infrastructure Strategy provides
recommendations for coherent approach to
research data and research data infrastructure
13
Integrated Marine Observing System

IMOS is designed to be a fullyintegrated, national system, observing
at ocean-basin and regional scales, and
covering physical, chemical and
biological variables.
 The IMOS Ocean Portal allows
scientists to discover and explore data
streams coming from the Facilities some in near-real time, and all as
delayed-mode, quality-controlled
data. These data streams, long timeseries that are 'under construction',
represent the actual research
infrastructure being created and
developed by IMOS.
14
Data is Transformative
 Governments are not investing in research data to
make life easier for researchers
 Investments in research data to enable societal
problems to be addressed
 This requires data to be in a form that allows a
wide variety of use
15
AURIN – Urban data infrastructure
 How can I increase the value of my suburban
property development?
 How do I make it more “liveable” to attract more
buyers?
 Integrate data from developers, local government,
state government, federal government, mapping
data, roads data, public transport maps….
 Apply University of Melbourne developed
16
“walkability” index
How do you develop suburbs that
work for residents, developers and
local government?







Along the Maribyrnong River, 10 km from Melbourne’s CBD, 128 ha of government land is
ripe for redevelopment
It could accommodate 3000 dwellings and offices for 3000 people
Planning a sustainable, liveable community integrated into its urban surrounds demands
information on transport, health services, environment, housing prices, recreation
facilities and more
This comes from Federal and State government agencies, local councils, utilities and
private companies
For Maribyrnong, data and 80 tools to manage it are being made available through the
Australian Urban Research Intelligence Network (AURIN) and the Australian National Data
Service (ANDS)
New tools—such as employment opportunities and walkability—are being added
Similar projects can facilitate development across Australia’s cities and towns
17
Australian National Data Service:
To make Australia’s research data assets
more valuable for its researchers, research
institutions and the nation
18
So we need to transform:
Data that are:
Unmanaged
Disconnected
Invisible
Single use
To Structured Collections that are:
Managed
Connected
Findable
Reusable
Value
so that researchers can easily publish, discover, access
and use research data.
Research Data
Australia
20
What worked well:





Getting going
Establish a “voice for data”
Coherence of research data infrastructure
Coordination of policy and infrastructure
Establishing research institutions at the centre of research
data system
 Establishing a national system of infrastructure
complementing institutional and thematic infrastructure
 Establishing international cooperation
21
Major Open Data Program
 Connecting mining data, to
research techniques, to
industry exploration
 Connecting twitter data to
Jakarta map to analytics for
managing flooding
 Collecting tropical data to
institutional strategy
 Collecting ancient DNA for
forming international
partnerships for new results
22
Achievements to Date:
 Australian Research Data Commons established
 100,000 data collections are described and discoverable
 ANDS has formed partnerships with most Australian
universities and publicly funded research organisations
 Research Institutions have substantially greater research
data management capacity than 5 years ago
 Research data is on the agenda of DVC’s-R
 Jointly Australia has world leading research data
infrastructure
 Australia has a leading role in world research data
infrastructure through the Research Data Alliance
23
Data Opportunities – and threats
 Data sharing is great for trust development
 Data openness challenges traditional business
models
 Data partners can be anywhere – EU is investing
€1.4B in open data to drive jobs and innovation
24
25
From G. Boulton
 Royal Society
publishes “Science as
an open enterprise” –
written by Geoffrey
Boulton
 Influential in EU/UK
26
FAIR Data – (FORCE 11)
To be Findable:
(meta)data are assigned a globally unique
and eternally persistent identifier.
data are described with rich metadata.
 (meta)data are registered or indexed in a
searchable resource.
metadata specify the data identifier.
To be Accessible:
 (meta)data are retrievable by their identifier
the protocol is open, free, and universally
implementable
the protocol allows for an authentication and
authorization procedure, where necessary.
metadata are accessible, even when the data
are no longer available.
To be Interoperable:
(meta)data use a formal, accessible, shared,
and broadly applicable language for
knowledge representation
(meta)data use vocabularies that follow FAIR
principles.
 (meta)data include qualified references to
other (meta)data.
To be Re-usable:
meta(data) have a plurality of accurate and
relevant attributes.
(meta)data are released with a clear and
accessible data usage license
(meta)data are associated with
their provenance.
 (meta)data meet domain-relevant
community standards.
27
EU Open Data “Pilot”
 1.4B Euros as part of H2020
 80% take up
28
Data citation
 Data that is used
should be cited – just
as other work is cited
 Provides appropriate
credit
 Enables reproduction
 DataCite provides
reliability
 Agreed basic
information: Creator
(Publication year), Title,
Publisher, Identifier
 Suitably formatted DOI
29
Data citation works with..






 Connection is key
ORCID – for people
 And the connections
Crossref – for papers
should be machine
Fundref – for funders
operable
IGSN – for specimens
 Research is more
…
valuable if it is more
Can we measure the
connected
value? Bibliometricians
arise!
30
Data Journals
 Geoscience Data
Journal (Wiley)
 Scientific Data (Nature)
 Journal of Open
Archaeology
Data (Ubiquity)
 Biodiversity Data
Journal (Pensoft)
 A means of describing
the data – its formation,
properties, usage
 Enables recognition of a
contribution
 Enhances usage of the
data
 Enables “traditional”
bibliometrics
31
So data is more valuable if:
 It supports
Reproducible Science
 It supports Open
Science
 Is Open
 Is Citable
 Is published
 Is reliably available
 Is available form a
reliable digital library
 Is FAIR
 It reliably uses the data
services that are
discussed at ADLC 2015
32
Advertisement: Research Data Alliance
- You may agree that data preservation is important
- You may agree that international agreements are important
- Using the Research Data Alliance working groups is a good way
of getting wider agreement for issues that are important to you
Data Libraries (repositories):
Provide:
 Data storage
 Metadata storage
 Data access methods
 Data management
software
 Data analysis services?
 Data processing
services?
But also:
 Integrated approach
to content and
metadata
 Policies, processes,
services, and people
 Overall commitment
to the stewardship of
digital materials
34
Trusted data repositories (libraries)
 Need for reliable data
 Trusted repositories:
 Trusted Repositories Audit & Certification (TRAC) -ISO 16363
 Data Seal of Approval e.g. Pacific and Regional Archive for
Digital Sources in Endangered Cultures (PARADISEC)
 Often required by publishers
 May be increasingly required (and funded) by
research funders
35
36
The Opportunity
 Fully integrated publication of all outputs of a
scholarly endeavour with rich connection
 FAIR data in a trusted repository
 Fully explorable scholarly journals
 Researchers get much better exposure of their
research
 The outcomes are defensible
 New research and partners become available
37
So that’s good…
 But a full function digital library has more to offer
 Where is the biggest saving in research?
 Where do the breakthroughs come from?
38
From a bioinformatician – Matjaz Hren
 Biggest waste of time in research are:
 Meetings – need ELN integration
 Data entry – need automated data and metadata
capture tools
 Data search – need rich data catalogues
39
Dan Steinberg, Salford Systems
 In community of data miners and statistical
modelers
 Most working at major corporations supporting
extensive analytical projects
 Spend 80% of their effort in manipulating the data
so that they can analyze it
40
Ashley Buckle, Protein Chrystalographer
 Required to prepare rich descriptions of data for
associated publication
 Took he and a librarian a week of effort
 A tool that automated the capture of data from
the synchrotron, migrated it, added metadata,
added project information, added DOI
 Takes 15 minutes to prepare data
41
Long Term Ecological Research Network
From the report at http://knowledgeinfrastructures.org:
"Our call for methodological and collaborative innovation is best explained
via an analogy in the natural sciences. Twenty years ago, the average
ecologist worked on a patch of land no larger than a hectare, typically for a
few months or a year, gathered data over a thirty-year career, published
results, and then gradually lost the data. With the creation of the Long
Term Ecological Research Network (LTER), the National Science Foundation
began to change the nature of research. Today, at a number of sites
nationally and in consonance with international projects, ecologists are
able to look beyond the scale of a field and timeframe of a career: they
now have the prospect of studying ecology and climate locally, nationally,
globally, and over spans of time that more closely match those of ecological
change.
42
So research is changing
 More, and more complex data
 Its getting harder to wade through it
 Yet insight is often connecting the pieces, seeing
patterns, using new techniques
…not being a poor information professional with
home grown data and tools
43
A key role:
 A data library AND a data librarian can play a key
role in reducing both the cost of data capture,
gathering, preparation, as well as data publication
 Thus effort is transferred from researchers to
information systems and information professionals
 ..to where it should be because it saves money,
and adds reliability to research
44
What’s needed of a digital repository?
 You can find the data you’ve generated or need
 You can open the data you’ve generated or need
 You understand what the data is and what it’s
about
 You can use or work with the data in the way you
need
 You trust the data is what is says it is
Managing Digital Continuity UK National Archives 2011
45
So we really can change the picture:
Big data: Data size, complexity, reliability
By dramatically reducing the cost of gathering and publishing,
through reliable data libraries and librarians
46
Conclusions
 Research data is valuable
 It should be expected that the data underpinning
findings are available for scrutiny
 Far greater value is available, especially if it is
findable, accessible, interoperable and reusable
 This is helped if data is collected, used and
published with reliable data libraries
47
Thank you!
ANDS is supported by the Australian Government through the National Collaborative
Research Infrastructure Strategy (NCRIS).
This work is licensed under a Creative Commons Attribution 3.0 Australia License
48
Download