Unpacking HathiTrust`s New Cost Model

advertisement
HATHI TRUST
A Shared Digital Repository
Unpacking HathiTrust’s
New Cost Model
Jeremy York
Project Librarian, HathiTrust
SUNY
July 15, 2011
About
Partnership
Arizona State University
Boston University
Baylor University
California Digital Library
Columbia University
Cornell University
Dartmouth College
Duke University
Emory University
Harvard University Library
Indiana University
Johns Hopkins University
Lafayette College
Library of Congress
Massachusetts Institute of
Technology
Michigan State University
New York University
New York Public Library
North Carolina Central
University
North Carolina State University
Northwestern University
The Ohio State University
The Pennsylvania State
University
Princeton University
Purdue University
Stanford University
Texas A&M University
Universidad Complutense de
Madrid
University of California
Berkeley
Davis
Irvine
Los Angeles
Merced
Riverside
San Diego
San Francisco
Santa Barbara
Santa Cruz
The University of Chicago
University of Florida
University of Illinois
University of Illinois at Chicago
The University of Iowa
University of Maryland
University of Michigan
University of Minnesota
The University of North
Carolina at Chapel Hill
University of Notre Dame
University of Pennsylvania
University of Pittsburgh
University of Utah
University of Virginia
University of Washington
University of WisconsinMadison
Utah State University
Yale University Library
Digital Repository
• Launched 2008
• Initial focus on digitized book and journal
content
• “Light” archive
– As accessible as possible within the bounds of law
Statistics
•
•
•
•
8,980,200 volumes
4,679,248 book titles
214,155 serial titles
2,450,522 “public domain”
The Name
• The meaning behind the name
– Hathi (hah-tee)--Hindi for elephant
– Big, strong
– Never forgets, wise
– Secure
– Trustworthy
Mission
• To contribute to the common good by collecting,
organizing, preserving, communicating, and
sharing the record of human knowledge
Goals
• Comprehensive collection
• Preservation…with Access
• Shared strategies
–
–
–
–
Collection management, development
Preservation
Copyright
Efficient user services
• Openness
Governance
Governance
Budget/Finances
Decision-making
Strategic
Advisory Board
Executive
Committee
HathiTrust
Guidance on
Policy,
Planning
Executive Committee
•
•
•
•
•
•
Paul Courant, University Librarian and Dean of Libraries, UM
Laine Farley, Executive Director, CDL
John King, Vice Provost for Academic Information, UM
Paula Kaufman, University Librarian and Dean of Libraries, UI
Brian Schottlaender, University Librarian, UCSD
Ed Van Gemert, Deputy Director of Libraries, UW – Madison
(ex officio)
• Brenda Johnson, Dean of Libraries, IU
• Brad Wheeler, Chief Information Officer, IU
• John Wilkin, Executive Director of HathiTrust and
Associate University Librarian, LIT, UM
Strategic Advisory Board
• Ed Van Gemert (Chair), Deputy Director of Libraries, University
of Wisconsin - Madison
• John Butler, AUL for Information Technology, University of
Minnesota
• Patricia Cruse, Director, Preservation, CDL
• Todd Grappone, AUL for Digital Initiatives & IT, UCLA
• Julia Kochi, Director, Digital Library and Collections, UC San
Francisco
• Sarah Pritchard, University Librarian, Northwestern University
• Paul Soderdahl, Director, LIT, University of Iowa
• John Wilkin, Executive Director, HathiTrust (ex officio)
• Robert Wolven, Columbia University
Constitutional Convention
• October 2011
• Delegates from each institution and
consortium
– Carry certain number of votes determined
according to formula approved by Executive
Committee
• 3-year review
• Proposals
– Print management
– Ballot proposals
Partnership
Partnership
• Who can become a partner?
– Institutions worldwide
– Libraries with print holdings
What are the benefits? (1)
• Cost-effective long-term preservation and access services
for digitized content
– Commitments on digital content facilitate decisions about
digitization efforts and print collection management
• For those with content, immediately offering long-term
preservation, bibliographic and full-text search,
collection-building
• With content or not, full viewing and downloading
capabilities for public domain materials and materials for
which we have received permissions
What are the benefits? (2)
• Specialized access to public domain and in-copyright materials
for users with print disabilities
• Other lawful uses of in copyright materials such as Section
108 uses (print replacement copies, digital access to
applicable works), access to orphan works
• HathiTrust encourages participation in initiatives and
resources geared toward
– Shared collection development and management (e.g., copyright
review work, print holdings database, de-duplication, collaboration
with other organizations and initiatives)
– Participation in governance and collaborative initiatives
– Defining future directions of the shared library.
What’s involved?
• Contract
– Sustaining
– Content-Contributing
• Yearly fees
• Commitment
– 5-year periods
• Shibboleth
• Print Holdings
Costs
• Base funding from partner institutions
• Basic infrastructure costs
• Commitments in 5-year periods
How much does it cost? (1)
How much does it cost? (2)
• $0.149/volume/year for Google-digitized
• $0.489/volume/year for IA-digitized
• $0.154/volume/year for all content
• $3.40 per GB
Governance
Budget, Finances
Decision-making
Policy
Enterprise
Management
Repository
Administration
Repository
Administration
Communication
and Coordination
with partner
institutions
Hardware
configuration and
maintenance
Data management
(content storage,
backup, integrity
checks, deletion)
Project
management
Planning
Web and
application server
configuration and
maintenance
Security
Hardware selection
and replacement
Content and
Metadata
specifications
Permissions
Rights
Management
Bibliographic
Data
Management
Copyright
determination
Entity description
(record-level)
Copyright review
Object
identification
(item-level)
Copyright
information
management
(database)
Data availability
Collection
Development
Digital
• Expansion beyond
books and journals
(born-digital,
images and maps,
audio)
• Selection of
content (for nonGoogle volume
ingest and pilots
projects)
Print
• Cloud Library (effect
of digital on print)
Rightsholder
permissions
Disaster Recovery
Logging
Processes for
ensuring content
integrity
e-Commerce
Print on Demand
Content Ingest
Content Access
Quality
Assurance
User Services
Transformation
PageTurner
Quality Review
Usability
Validation
Collection Builder
Content
Certification
User support
(helpdesk)
Large-scale Search
Financial
contributions
of partners
Research Center
Bibliographic
Catalog
APIs
HathiTrust Functional
Framework
Outreach
Project website
Monthly
newsletter
Papers and
presentations
Communication
with potential
partners
Surveys, general
inquiries
Repository
evaluation and
audit (e.g.,
DRAMBORA,
TRAC)
Legal
Risk management
(use of materials)
Partner
agreements
Advocacy
How does it work? (1)
• Sustaining membership is base
– Pricing model for all partners beginning 2013
– Based on overlap of HathiTrust volumes with
institutions’ print holdings
– Share in infrastructure costs for public domain
volumes:
• (PD*C*X)/N
– Share in infrastructure costs for in copyright
volumes based on holdings
• For a given incopyright volume:
• IC=(C*X)/H
How does it work? (2)
• Main factors in costs are
– Amount of content
– Number of partners
– Also a flexible multiplier designed to pay for
programmatic activities
• Tend to result in lower costs and more
benefits over time
Example
• Factors
–
–
–
–
–
1,000,000 PD volumes
3,000,000 IC volumes
$0.154 per volume
60 partners
Assume on average 12 institutions hold IC volumes
• Costs
– PD = (1,000,000 * .154 * 1.5) / 60 = $3,850
– IC = (3,000,000 * .154 * 1.5) / 12 = $57,750
– Total = $61,600
How does it work? (3)
• In order to support these calculations
– Need print holdings database (2013)
– Update mechanisms
– Manual remediation
• Analysis will also support
– Expansion of legal uses of materials, to users who
have print disabilities, to orphan works
– Facilitate collaborative collection development
and management operations
– Will also benefit efforts in de-duplication
Print Holdings Database
• Volumes institutions own or have owned
– Only print volumes (not microform, etc.)
– OCLC number [required]
– Bib record ID [required]
– Condition (e.g., brittle) [optional]
– Holding Status (e.g., current holding, withdrawn,
missing, etc.) [optional]
Percent Overlap
Average = 37.4%
Questions
• Why not get the information from OCLC?
• Is it necessary to declare all volumes held, or
could an institution choose not to declare
some?
• Are the print holdings data currently provided
by institutions taken as an indication of the
volumes institutions are declaring they have
access to?
What are we doing currently?
• Basing yearly fees on estimates
– Based on infrastructure costs of anticipated
content
– Estimated partnership growth
– Institution total volume counts
SUNY Costs
• SUNY University Centers
– Albany, Binghamton, Buffalo, Stony Brook, Update
and Downstate Medical Libraries
– 11,049,952 volumes
• All SUNY (based on 16,000,000 titles)
– 27 institutions total
– 20,800,000 volumes
SUNY costs (2)
• Estimate using
– 9,500,000 volumes at end of 2011
– 60 partners (for University Centers and Medical
libraries)
– 87 partners (for all SUNY libraries)
– Multiplier of 1.5
SUNY costs (3)
• University Centers
– Public Domain
• Total PD cost * 1.5 / #partners * 6 = $70,903.22
– In Copyright
• % of holdings (partner holdings / total holdings) * Total
IC cost * 1.5 = $67,635.06
– Total = $138,538.28
• Prorated from August 1 = $58,072.21
SUNY costs (3)
• All SUNY
– Public Domain
• Total PD cost * 1.5 / 87 * 27 = $220,044.49
– In Copyright
• % holdings (partner holdings / total holdings) * Total IC
cost * 1.5 = $127,198.61
– Total = $347,243.09
• Prorated from August 1 = $145,556.69
Sustaining v. Content-Contributing
• Does not exclude contribution of content
• If contribute content, costs covered up to
amount that would be paid as Sustaining
partner
– Barring additional costs that might be needed to
accommodate content (e.g., specialized load
routines, generation of OCR)
• Above that, pay per-GB cost ($3.40)
Summary
• Partners share in costs of sustaining common
resource
• Share in uses of relevant materials
• Voice in future directions
• Costs to institutions go down
• Quality of services increases
– Realize in aggregated collection, something don’t
get through distributed search or federation
• Free riders?
Changing Library Landscape
• Rapidly changing landscape
• Libraries are making these decisions but they
are more and more collective decisions
• We cannot afford anymore to do work
separately that could be done collaboratively
HathiTrust overall benefits to libraries
• Digital Curation
–
–
–
–
–
–
Drive costs down
Reduce “bibliographic indeterminacy”
Make meaningful decisions about formats and quality
Increase discoverability, use
Consolidate development talent
Improve strength of archiving
• Print Curation
– Means to associate our print holdings
– Coordinated record-keeping
• Subsidiary benefits
– Quantify problems
– Collective attention to solving shared problems
How to find out more
• Web site “About” section:
http://www.hathitrust.org/about
• Twitter: http://twitter.com/hathitrust
• Monthly newsletter:
http://www.hathitrust.org/updates
• RSS: http://www.hathitrust.org/updates_rss
• Contact us: feedback@issues.hathitrust.org
Thank you very much!
Download