Role of Libraries in Data Curation

advertisement
The Role of Libraries in
Data Curation
Or ‘How do we even get
started?’
John MacColl
European Director,
RLG Partnership
9 June 2010
What I want to talk about
• The importance of data
• Institutional vs domain solutions
• Skills needs
• Our project
• Reward structures
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
2
The importance of data
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
3
It’s the data, stupid
• ‘astronomers are just as likely to
point a software query tool at a
digital sky survey as to point a
telescope at the stars’ (The
Economist, Feb 2010)
• ‘“It's like the invention of the
telescope," Franco Moretti, a
Stanford professor of English and
comparative literature, says of
Google Books. "All of a sudden, an
enormous amount of matter
becomes visible.” (The Chronicle,
‘The humanities go Google’, May
28 2010)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
4
DataVerse (Gary King, 2007)
“Data sometimes exist on individual
researchers’ Web sites, without
professional backups, off-site
replication, plans for format conversion
and migration, or professional
cataloging.”
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
5
Pious hopes (Carole Palmer)
•
60% ‘archive’ generated or collected
data (no offsite backup)
•
61% expect to keep more than 10 years
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
6
Data lost, and data never born (U Wisconsin Summary
Report of the Research Data Management Study Group
(2009))
‘In some cases, inadequate
storage capacity is leading to
loss of data: forcing some
researchers to discard data
from past experiments in order
to make room for current ones
or to avoid certain types of
experiments and research
altogether’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
7
Data and their uses
Freely available
Locked away
Embargoed
Shared with collaborators
Secondary artifacts: statistical and pattern analyses; subset
extractions; visualisations; simulations; discovery environments
transformations
Primary data: sensory, numeric, digitised, geospatial, etc
Ancillary data: questionnaires, fieldnotes, lab notebooks, data
dictionaries, annotations, lecture notes, etc
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
8
Don’t try this at home?
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
9
Institutional vs domain solutions
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
10
Blue Ribbon Task Force on Sustainable Digital
Preservation and Access: on aggregation
‘Creating economies of scale among archives
when possible is always desirable, and may be
critical when the materials under stewardship
require particular kinds of expertise that are
scarce. This is the case for much scientific
data.’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
11
Qualified gravitational pull (Green and
Gutmann)
‘Most institutional repositories do not and cannot
offer support for managing dataset formats over time
… Policies for long-term stewardship vary among
institutions, but many have developed a sliding scale
of preservation promises’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
12
Oxford University: Research data management services:
findings of the consultation with service providers
(September 2008)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
13
Cornell DataStaR: a ‘staging repository’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
14
Datasets in Cornell IR
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
15
Monash approach (institutional) (Treloar)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
16
U Wisconsin proposal
‘Solutions comprised solely of expensive
technology will fail, because of the underlying
need to establish long-lasting cultural stability
within and between the research, library, and
IT communities on campus.’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
17
Curation responsibilities (Carlson,
The Chronicle, 2006)
“Data from Big Science is … easier to handle, understand and
archive.
Small Science is horribly heterogeneous and far more vast. In time
Small Science will generate 2-3 times more data than Big Science.”
big science
data
domain?
institution?
small science data
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
18
Experiments … failures …
• NSF DataNet – Data Conservancy project. $20m
awarded. Led by JHU. Includes social sciences.
• U. Va. Mellon grant $870k. Programmers and
archivists. Includes Stanford, Yale and Hull. To create
a model for digital collection management ‘that can
be easily shared among research libraries’.
• UKRDS 
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
19
Meanwhile …
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
20
Specialist data archives
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
21
Skills needs
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
22
Is this possible (Gabridge)?
‘libraries can develop existing liaisons with interest,
passion, and strong analytical skills; or they can
recruit domain experts, and teach them about
excellent information science practices.’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
23
ARL study: Scott Brandt
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
24
Our project
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
25
Joint OCLC Research-LIBER
• Binghamton
• Brigham Young
• Cambridge
• Leeds
• Melbourne
• Nijmegen
• Oxford
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
26
Deliverables
• Desk research
• Case studies
• Interviews with researchers
• Report and recommendations
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
27
Project Aim
‘It has been frequently asserted in the
literature on data curation that there are new
service roles for research libraries emerging.
This project will seek to test this hypothesis
by considering the data curation requirements
of a number of recently completed research
projects in a sample group of North American
and European universities …’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
28
Method
‘Each university partner will produce two or three
case studies of projects in which data has been
generated, and consider the data curation
implications of these … The project will conclude
with an assessment of the potential role of the
research library in general in relation to such
datasets, based on the examples of good practice
discovered via the case studies.’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
29
Project Approach
‘The proposed project will adopt a ‘bottomup’ approach and be grounded in the realities
of data storage and preservation behaviour as
exemplified in a number of real instances’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
30
Scale again …
‘We consider that the question of how to arrive at an
articulation between the institutional library and
domain or funder data archives is one of the most
urgent requirements in this area, and the project will
explore it carefully.’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
31
Environments: data
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
32
Timescapes (Leeds)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
33
Nyman/Jones Archive (Leeds)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
34
The Australian Women’s Register
(Melbourne)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
35
Life Patterns (Melbourne)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
36
Incremental Project (Cambridge)
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
37
What do we expect?
•Not a great deal!
•Need to adjust our timescales?
•Signs of progress?
• Indications of favourable organisational frameworks?
• Indications of favourable policies?
• A taking of stock …
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
38
Reward structures
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
39
Day’s understatement
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
40
Being excited about being cited
(DataVerse, King)
‘Articles with accessible data are cited twice
as often as otherwise equivalent articles that
do not provide data access.’
‘Articles in journals with replication policies
that make data available are cited thrice as
frequently as otherwise equivalent articles
without accessible data’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
41
Library neutrality (Steinhart, 2007)?
‘There is ample evidence that even
when appropriate data repositories
exist for a particular discipline,
researchers often fail to take full
advantage of them … This lack of
participation in data sharing and
archival activities suggests an
opportunity for academic libraries
to provide a much-needed service’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
42
Thinning the library
• No longer just about capture of outputs at the
endpoint
• The library has to be involved in the whole process of
research and scholarship, throughout its lifecycle
• This involves ‘thinning out’ the library
• Rethinking the point of engagement
• The library becomes engineering …
• … and people
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
43
Ten Questions to Begin a Conversation With Your
Faculty About Data Curation (Witt & Carlson)
1. What is the story of your data?
2. What form and format are the data in?
3. What is the expected lifespan of your data?
4. How could your data be used, reused, and repurposed?
5. How large is your dataset, and what is its rate of growth?
6. Who are potential audiences for your data?
7. Who owns the data?
8. Does the dataset include any sensitive information?
9. What publications or discoveries have resulted from the data?
10. How should the data be made accessible?
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
44
Repositories at present are the wrong
model (Green and Guttman)
‘repositories position themselves at or near
the end of the scientific research life cycle.
Their goal is less to partner with researchers
or with domain-specific repositories
throughout the research life cycle than … to
garner the value of the institution’s
productivity’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
45
Appraisal (Cornell)
‘The archivist can no longer wait “passively at the
end of the life cycle for records to arrive at the
archives when their creators no longer wanted them –
or were dead” (Cook 2000).’
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
46
Discussion!
John MacColl
Next up
Lunch and then…
1:00
Framing Libraries and the Environment
Lorcan Dempsey, OCLC Research
Buckingham
The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010
48
Download