Data Curation: Challenges and Opportunitiees for Research Libraries

advertisement
DATA CURATION:
CHALLENGES AND OPPORTUNITIES
FOR RESEARCH LIBRARIES
Brian E. C. Schottlaender
The Audrey Geisel University Librarian
26 September 2012
OSU “Library Futures” Seminar
1
SHOULD I TALK ABOUT …
• … declining:
– budgets?
– numbers of staff?
– transactions?
•
•
•
•
•
… closing branch libraries?
… “rationalizing” collections?
… repurposing space?
… bottom-up strategic planning?
… moving to a service program–based
organizational structure?
26 September 2012
OSU “Library Futures” Seminar
2
NO, I THINK I’LL TALK ABOUT …
DATA CURATION
26 September 2012
OSU “Library Futures” Seminar
3
OVERVIEW
•
•
•
•
•
•
The Scholarly Record
Stewardship
Data Curation
Why do data need to be curated?
Why should libraries curate data?
What should research libraries do?
26 September 2012
OSU “Library Futures” Seminar
4
THE SCHOLARLY RECORD?
The scholarly record is …
“… that which has already been written in all
disciplines ... that stable body of graphic information,
upon which each discipline bases its discussions, and
against which each discipline measures its progress.”
Ross Atkinson.
“Text Mutability and Collection Administration.”
Library Acquisitions: Practice & Theory,
Vol. 14 (1990)
26 September 2012
OSU “Library Futures” Seminar
5
WHAT DOES THE SCHOLARLY RECORD INCLUDE?
•
•
•
•
E-only journals
Reviews
Preprints and working papers
Encyclopedias, dictionaries,
and annotated content
Nancy L. Maron and K. Kirby Smith.
Current Models of Digital Scholarly Communication:
Results of an Investigation Conducted by Ithaka for the
Association of Research Libraries (November 2008)
26 September 2012
OSU “Library Futures” Seminar
6
“THE SCHOLARLY RECORD”
Scholarly
Publishing
(e.g., journal
articles)
Libraries
Trusted Third Parties
(e.g., JSTOR, Portico)
Stable
26 September 2012
OSU “Library Futures” Seminar
7
WHAT DOES THE SCHOLARLY RECORD INCLUDE?
E-only journals
Reviews
Preprints and working papers
Encyclopedias, dictionaries,
and annotated content
• Data resources
•
•
•
•
Nancy L. Maron and K. Kirby Smith.
Current Models of Digital Scholarly Communication:
Results of an Investigation Conducted by Ithaka for the
Association of Research Libraries (November 2008)
26 September 2012
OSU “Library Futures” Seminar
8
“THE SCHOLARLY RECORD”
Scholarly
Publishing
(e.g., journal
articles)
Libraries
Trusted Third Parties
(e.g., JSTOR, Portico)
Stable
26 September 2012
OSU “Library Futures” Seminar
9
WHAT DOES THE SCHOLARLY RECORD INCLUDE?
•
•
•
•
•
•
•
•
E-only journals
Reviews
Preprints and working papers
Encyclopedias, dictionaries, and annotated content
Data resources
Blogs
Discussion forums
Professional and academic hubs
Nancy L. Maron and K. Kirby Smith.
Current Models of Digital Scholarly Communication:
Results of an Investigation Conducted by Ithaka for the
Association of Research Libraries (November 2008)
26 September 2012
OSU “Library Futures” Seminar
10
“THE SCHOLARLY RECORD”
Infrastructures
largely
self-contained
Scholarly Raw
Material
(e.g., archives,
data)
INPUTS
Archives
Data Centers
[Some in Libraries; Some Not]
Less Stable
26 September 2012
Scholarly
Inquiry/Discourse
(e.g., blogs, wikis,
open notebooks
OPERATORS
?????
Very unstable
Emergent
OSU “Library Futures” Seminar
Scholarly
Publishing
(e.g., journal
articles)
OUTPUTS
Libraries
Trusted Third Parties
(e.g., JSTOR, Portico)
Stable
11
STEWARDSHIP 1
“Stewardship is a core value that includes notions of
mission, responsibility, integrity, trust, accountability,
service, preservation and sustainability for future use.”
Sharon E. Farb.
“Libraries, Licensing, and the
Challenge of Stewardship.”
First Monday, Vol. 11, No. 7 (3 July 2006)
“As a society and as educational institutions, we have a
collective responsibility to preserve and make available,
along a continuum of a life cycle, our digital heritage.”
Jeffrey L. Horrell.
“Converting and Preserving the
Scholarly Record: An Overview.”
LRTS, Vol. 52, No 1 (January 2008)
26 September 2012
OSU “Library Futures” Seminar
12
Stewardship 2
• “There is a need for a close linking between digital data archives,
scholarly publications, and associated communication. The
potential for an expanded role for research libraries in the area of
digital data stewardship affords opportunities to address these
important linkages.”
• “Stakeholder groups have different expertise, outlooks,
assumptions, and motivations … Collaboration models to share
expertise and resources will be critical.”
To Stand the Test of Time—Long-Term Stewardship
of Digital Data Sets in Science and Engineering:
A Report to the National Science Foundation from the
ARL Workshop on New Collaborative Relationship (2006)
26 September 2012
OSU “Library Futures” Seminar
13
Stewardship 3
• “Historically, universities have played a leadership role in the
advancement of knowledge and shouldered substantial
responsibility for the long-term preservation of knowledge
through their university libraries. An expanded role for some
research and academic libraries and universities, along with
other partners, in digital data stewardship is a topic for critical
debate and affirmation.”
• “The scale of the challenge regarding the stewardship of digital
data requires that responsibilities be distributed across multiple
entities and partnerships that engage institutions, disciplines, and
interdisciplinary domains.”
To Stand the Test of Time … (2006)
26 September 2012
OSU “Library Futures” Seminar
14
DATA CURATION: WHAT IS IT?
“The activity of managing and promoting the use of
data from its point of creation, to ensure it is fit for
contemporary purpose, and available for discovery and
reuse. For dynamic datasets this may mean continuous
enrichment or updating to keep it fit for purpose.
Higher levels of curation will also involve maintaining
links with annotation and other published materials.”
Philip Lord, Alison Macdonald,
Liz Lyon, and David Giaretta.
“From Data Deluge to Data Curation.”
eScience All Hands Meeting 2004 (2004)
26 September 2012
OSU “Library Futures” Seminar
15
DATA CURATION: WHAT’S IT INCLUDE?
•
•
•
•
•
•
•
•
•
•
•
•
•
Design
Creation or Collection
Processing
Analysis
Appraisal
Selection
Description
Discovery
Dissemination
Repurposing
Storage
Preservation
Etc.
26 September 2012
OSU “Library Futures” Seminar
16
CURATION MODEL
Panos Constantopoulos,et al.
“DCC&U: An Extended Digital
Curation Lifecycle Model.”
The International Journal
of Digital Curation,
Issue 1, Vol. 4 (2009)
26 September 2012
OSU “Library Futures” Seminar
17
ACTORS …
“As we move from small to large scale data sharing, where
data are managed and maintained for broad access, we also
are seeing an increase in the number and type of
intermediaries. Intermediaries, in the form of organizations
and the people who work for them, prepare data for reuse
by eliciting, organizing, storing, packaging and/or
preserving data, and by performing various roles in
dissemination and facilitation …”
Ixchel M. Faniel and Ann Zimmerman.
“Beyond the Data Deluge: A Research
Agenda for Large-Scale Data Sharing and
Reuse.” The International Journal of
Digital Curation, Issue 1, Vol. 6 (2011)
26 September 2012
OSU “Library Futures” Seminar
18
… AND STAKEHOLDERS
• Disciplinary experts
• Functional experts
– Developers
– Curators
– Preservationists
•
•
•
•
•
•
•
•
Users
Archives
Data Centers
Libraries
Institutions
Professional Societies
Publishers
Governments
26 September 2012
OSU “Library Futures” Seminar
19
THE CURATION ECOSYSTEM 1
Data
Providers
Policy
Makers
Funders
Service
Providers
Systems
Providers
Data
Consumers
26 September 2012
OSU “Library Futures” Seminar
20
THE CURATION ECOSYSTEM 2
“… the activities of curation are highly
interconnected within a system of systems,
including institutional, national, scientific,
cultural, and social practices as well as economic
and technological systems. Data curation is a
nascent set of technologies and practices emerging
in the context of this complex and rapidly evolving
socio[economic]-technical ecosystem.”
Anna Gold.
“Data Curation and Libraries:
Short-Term Developments, Long-Term Prospects.”
http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1027&context=lib_dean
26 September 2012
OSU “Library Futures” Seminar
21
WHY DO DATA NEED TO BE CURATED?
• “The more effectively that data can be manipulated,
mined, managed, analyzed and served to communities,
the better the conduct of science can be supported.”
• “The more we can eliminate boundaries in this
exponentially growing sea of data, the better data can
be shared enabling multidisciplinary and collaborative
research …”
• “The more effectively students and faculty gain the
data intensive knowledge and skills, the larger the
impact will be on science and society.”
NSF-OCI Task Force
on Data and Visualization.
Report Draft Final (March 7, 2011)
26 September 2012
OSU “Library Futures” Seminar
22
WHY DO DATA NEED TO BE CURATED?
– BECAUSE DATA REUSE REQUIRES IT.
• WHY DO DATA NEED TO BE REUSED?
– BECAUSE TRANS-DOMAIN RESEARCH REQUIRES IT.
• WHY IS TRANS-DOMAIN RESEARCH IMPORTANT?
– BECAUSE SOLVING GRAND CHALLENGES REQUIRES IT.
• WHY IS SOLVING GRAND CHALLENGES IMPORTANT?
– BECAUSE THEY AFFECT ALL OF US.
26 September 2012
OSU “Library Futures” Seminar
23
WHY DO DATA NEED TO BE CURATED? 3
BECAUSE THE
GOVERNMENT
SAYS SO.
26 September 2012
OSU “Library Futures” Seminar
24
WHY SHOULD RESEARCH LIBRARIES CURATE DATA?
• Because we can:
“Research libraries, archives, and other stewardship institutions
have the capacity to aggregate and hold data, manage metadata,
deal with rights management and access, and help users.”
• Because we must:
“… uncurated data are as good as lost, even if the bits are stored
forever, because they cannot be interpreted correctly.”
• Because, left to their own devices, scientists won’t:
“… many if not most scientists focus on the shortest path to a
particular scientific result rather than the best long-term solution
for data reuse or data-service …”
NSF-OCI Task Force
on Data and Visualization.
Report Draft Final (March 7, 2011)
26 September 2012
OSU “Library Futures” Seminar
25
What Should Research Libraries Do?
1. Stop waiting and start proactive
engagement locally.
1. Stake a claim in the production cycle.
2. Start retraining and repurposing staff.
3. Be a doer, not a broker, wherever possible.
4. Consider digital curation collaborations.
5. Actualize collaborative engagement.
Tyler Walters and Katherine Skinner.
New Roles for New Times:
Digital Curation for Preservation.
Association of Research Libraries (2011)
26 September 2012
OSU “Library Futures” Seminar
26
WHAT HAVE I DONE?
• Reached out to the San Diego Supercomputer Center (on
whose Executive Committee I sit) to co-create the campus’
Research Cyberinfrastructure Initiative (RCI), funded by the
Chancellor.
• Leveraged the NDSA-funded Chronopolis Federated
Preservation Environment to create a Research Data
Curation Services Program.
• Hired a Director, and reallocated portions of two domain
specialists and a metadata analyst to her.
• Created Sample Data Management Plans for various NSF
Directorates.
• Launched five curation pilots in the Humanities and the
Sciences.
• Joined DPN and am preparing to field-test Chronopolis as a
DPN data triad.
26 September 2012
OSU “Library Futures” Seminar
27
AND SO …
26 September 2012
OSU “Library Futures” Seminar
28
AND SO …
26 September 2012
OSU “Library Futures” Seminar
29
AND SO …
26 September 2012
OSU “Library Futures” Seminar
30
AN EXAMPLE
26 September 2012
OSU “Library Futures” Seminar
31
AND SO …
26 September 2012
OSU “Library Futures” Seminar
32
CONCLUSION
• Digital scholarly output cannot be de-coupled from the
raw material and inquiry operations that generate that
output, at least not as easily as analog scholarly output
can be.
• It can’t be, it needn’t be, and it shouldn’t be.
• Its stewardship calls for a more expansive view of what
constitutes the scholarly record, a view that
encompasses more and different inputs, outputs, and
stakeholders; and a more distributed and interoperant
organizational and technical infrastructure.
26 September 2012
OSU “Library Futures” Seminar
33
QUESTIONS?
26 September 2012
OSU “Library Futures” Seminar
34
Download