Sharing a Federal Print Repository:Issues and

advertisement
HathiTrust
Sharing a Federal Print Repository: Issues and
Opportunities
May 25, 2011
Heather Christenson
Overview
•
•
•
•
•
About HathiTrust as an organization
Characteristics of the HathiTrust collection
Collection development & management
Changing library landscape
How print management fits in
HathiTrust is a…
• Partnership of 50+ research libraries, founded
in October 2008
• Distributed, collaborative enterprise
• Trusted digital preservation repository
• Digital library
• Service provider
• Platform to collectively explore new library
models
Goals
• Comprehensive collection
• Preservation…with Access
• Shared strategies
–
–
–
–
Collection management, development
Preservation
Copyright review
Efficient user services
• Openness
Mission and Goals
Governance
Budget, Finances
Decision-making
Policy
Enterprise
Management
Repository
Administration
Repository
Administration
Communication
and Coordination
with partner
institutions
Hardware
configuration and
maintenance
Data management
(content storage,
backup, integrity
checks, deletion)
Project
management
Planning
Web and
application server
configuration and
maintenance
Security
Hardware selection
and replacement
Content and
Metadata
specifications
Permissions
Rights
Management
Bibliographic
Data
Management
Copyright
determination
Entity description
(record-level)
Copyright review
Object
identification
(item-level)
Copyright
information
management
(database)
Data availability
Collection
Development
Digital
• Expansion beyond
books and journals
(born-digital,
images and maps,
audio)
• Selection of
content (for nonGoogle volume
ingest and pilots
projects)
Print
• Cloud Library (effect
of digital on print)
Rightsholder
permissions
Disaster Recovery
Logging
Processes for
ensuring content
integrity
e-Commerce
Print on Demand
Content Ingest
Content Access
Quality
Assurance
User Services
Transformation
PageTurner
Quality Review
Usability
Validation
Collection Builder
Content
Certification
User support
(helpdesk)
Large-scale Search
Financial
contributions
of partners
Research Center
Bibliographic
Catalog
APIs
Outreach
Project website
Monthly
newsletter
Papers and
presentations
HathiTrust Functional
Framework
Communication
with potential
partners
Surveys, general
inquiries
Repository
evaluation and
audit (e.g.,
DRAMBORA,
TRAC)
Legal
Risk management
(use of materials)
Partner
agreements
Advocacy
Partners involved at many levels
Governance by partners
•Executive Committee
•Strategic Advisory Board
In-kind contribution from partners
•Working Groups
•Operations and development
Funding from partners
Working Groups
• Appointed by Strategic Advisory Board and
Executive Committee
• Both operational and strategically-focused
groups
• Collections, Communications, Discovery
Interface, Full-text Search, Usability, User
Support
• Now 40+ people across the country
• Expertise from across the partnership
Cost Models
# 1: “Contributor” model
• Partners pay per volume deposited
• Economy of scale keep costs low
#2: “Sustaining” model, holdings-based
• Partners share in costs of curating aggregate collection
• Share in uses of relevant materials
• Voice in future directions
• Sustaining common resource
• Quality of services increases
• Costs go down
HathiTrust Content
• HathiTrust content has been curated over time
by librarians
– Mirrors collections of large research libraries
– Focus on quality
• Expanding Non-Google content
– Public Domain: Copyright Review Management
System
– Content from non-Google sources
• Internet Archive, image collections, government
publications
Content Sources
* As of May 1, 2011
Content over time
100%
Chicago
90%
Madrid
80%
Columbia
70%
LoC
Harvard
60%
Minnesota
50%
Indiana
40%
Princeton
NYPL
30%
Cornell
20%
Wisconsin
10%
California
0%
Michigan
* As of May 1, 2011
HathiTrust Content Growth
Content Distribution
8,234,081 – Total volumes
2,102,033 – Public Domain
4,527,381 Book titles
202,649 Serial titles
* As of March 5, 2011
Language Distribution (1)
The top 10 languages make up
~86% of all content
* As of May 1, 2011
Statistics and Visualizations
Language Distribution (2)
The next 40
languages make
up ~13% of total
* As of May 1, 2011
Statistics and Visualizations
Dates
* As of May 1, 2011
Statistics and Visualizations
Collection Development and
Management
Collections Committee
• Prioritization of collection development activities
• Appropriate principles for duplicate volumes
• Process for decision-making and prioritization for
new content types
• Recommendations for tools and services
• Prioritization of copyright review and rightsclearing processes
• Print management proposal
A global change in the library environment
60%
Academic print book collection already substantially
duplicated in mass digitized book corpus
50%
% of Titles in Local Collection
June 2010
Median duplication: 31%
40%
30%
20%
June 2009
Median duplication: 19%
10%
0%
0
20
40
60
80
Rank in 2008 ARL Investment Index
100
120
Continuing growth of overlap …
• ARL overlap
– 31% in June 2010
– 45% by December, 2010
• Oberlin Group overlap
– 41% in December, 2010
– Higher rate of overlap per added volume?
– Close to 50% in May, 2011
And yet every library is different
•
•
•
•
•
•
Our median rate of overlap may be the same
But our overlap profiles will differ by library
Our use patterns differ
Our risk profiles differ
Our roles vis-à-vis our constituencies differ
Thus, the need to act independently on
common data
Extending the holdings database
• HathiTrust print holdings database
– Basis for new cost model (overlap of in-copyright)
– Basis for lawful uses (e.g., print disabilities, Section 108)
– A more complete picture than elsewhere
• Print monograph storage proposal
– Enable partners to register commitments
– Establish definitions (e.g., environment, use and condition)
– Build in cost-sharing: collectively fund those that make
commitments
– Communicate information to partners to facilitate
decision-making
Next steps?
• Work to develop draft proposal underway by
HathiTrust Collections Committee
– Focus on monographs
– Dovetail with other efforts
• Consideration (as part of new cost model), at
Constitutional Convention in Oct. 2011
Changing Library Landscape
• Rapidly changing landscape
• Libraries are making these decisions but they
are more and more collective decisions
• We cannot afford anymore to do work
separately that could be done collaboratively
How to find out more
• Web site “About” section:
http://www.hathitrust.org/about
• Twitter: http://twitter.com/hathitrust
• Monthly newsletter:
http://www.hathitrust.org/updates
• RSS: http://www.hathitrust.org/updates_rss
• Contact us: feedback@info.hathitrust.org
Thank you!
Heather Christenson
University of California, California Digital Library
heather.christenson@ucop.edu
Download