HATHI TRUST A Shared Digital Repository Unpacking HathiTrust’s New Cost Model Jeremy York Project Librarian, HathiTrust SUNY July 15, 2011 About Partnership Arizona State University Boston University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Lafayette College Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of WisconsinMadison Utah State University Yale University Library Digital Repository • Launched 2008 • Initial focus on digitized book and journal content • “Light” archive – As accessible as possible within the bounds of law Statistics • • • • 8,980,200 volumes 4,679,248 book titles 214,155 serial titles 2,450,522 “public domain” The Name • The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy Mission • To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge Goals • Comprehensive collection • Preservation…with Access • Shared strategies – – – – Collection management, development Preservation Copyright Efficient user services • Openness Governance Governance Budget/Finances Decision-making Strategic Advisory Board Executive Committee HathiTrust Guidance on Policy, Planning Executive Committee • • • • • • Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) • Brenda Johnson, Dean of Libraries, IU • Brad Wheeler, Chief Information Officer, IU • John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM Strategic Advisory Board • Ed Van Gemert (Chair), Deputy Director of Libraries, University of Wisconsin - Madison • John Butler, AUL for Information Technology, University of Minnesota • Patricia Cruse, Director, Preservation, CDL • Todd Grappone, AUL for Digital Initiatives & IT, UCLA • Julia Kochi, Director, Digital Library and Collections, UC San Francisco • Sarah Pritchard, University Librarian, Northwestern University • Paul Soderdahl, Director, LIT, University of Iowa • John Wilkin, Executive Director, HathiTrust (ex officio) • Robert Wolven, Columbia University Constitutional Convention • October 2011 • Delegates from each institution and consortium – Carry certain number of votes determined according to formula approved by Executive Committee • 3-year review • Proposals – Print management – Ballot proposals Partnership Partnership • Who can become a partner? – Institutions worldwide – Libraries with print holdings What are the benefits? (1) • Cost-effective long-term preservation and access services for digitized content – Commitments on digital content facilitate decisions about digitization efforts and print collection management • For those with content, immediately offering long-term preservation, bibliographic and full-text search, collection-building • With content or not, full viewing and downloading capabilities for public domain materials and materials for which we have received permissions What are the benefits? (2) • Specialized access to public domain and in-copyright materials for users with print disabilities • Other lawful uses of in copyright materials such as Section 108 uses (print replacement copies, digital access to applicable works), access to orphan works • HathiTrust encourages participation in initiatives and resources geared toward – Shared collection development and management (e.g., copyright review work, print holdings database, de-duplication, collaboration with other organizations and initiatives) – Participation in governance and collaborative initiatives – Defining future directions of the shared library. What’s involved? • Contract – Sustaining – Content-Contributing • Yearly fees • Commitment – 5-year periods • Shibboleth • Print Holdings Costs • Base funding from partner institutions • Basic infrastructure costs • Commitments in 5-year periods How much does it cost? (1) How much does it cost? (2) • $0.149/volume/year for Google-digitized • $0.489/volume/year for IA-digitized • $0.154/volume/year for all content • $3.40 per GB Governance Budget, Finances Decision-making Policy Enterprise Management Repository Administration Repository Administration Communication and Coordination with partner institutions Hardware configuration and maintenance Data management (content storage, backup, integrity checks, deletion) Project management Planning Web and application server configuration and maintenance Security Hardware selection and replacement Content and Metadata specifications Permissions Rights Management Bibliographic Data Management Copyright determination Entity description (record-level) Copyright review Object identification (item-level) Copyright information management (database) Data availability Collection Development Digital • Expansion beyond books and journals (born-digital, images and maps, audio) • Selection of content (for nonGoogle volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Rightsholder permissions Disaster Recovery Logging Processes for ensuring content integrity e-Commerce Print on Demand Content Ingest Content Access Quality Assurance User Services Transformation PageTurner Quality Review Usability Validation Collection Builder Content Certification User support (helpdesk) Large-scale Search Financial contributions of partners Research Center Bibliographic Catalog APIs HathiTrust Functional Framework Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy How does it work? (1) • Sustaining membership is base – Pricing model for all partners beginning 2013 – Based on overlap of HathiTrust volumes with institutions’ print holdings – Share in infrastructure costs for public domain volumes: • (PD*C*X)/N – Share in infrastructure costs for in copyright volumes based on holdings • For a given incopyright volume: • IC=(C*X)/H How does it work? (2) • Main factors in costs are – Amount of content – Number of partners – Also a flexible multiplier designed to pay for programmatic activities • Tend to result in lower costs and more benefits over time Example • Factors – – – – – 1,000,000 PD volumes 3,000,000 IC volumes $0.154 per volume 60 partners Assume on average 12 institutions hold IC volumes • Costs – PD = (1,000,000 * .154 * 1.5) / 60 = $3,850 – IC = (3,000,000 * .154 * 1.5) / 12 = $57,750 – Total = $61,600 How does it work? (3) • In order to support these calculations – Need print holdings database (2013) – Update mechanisms – Manual remediation • Analysis will also support – Expansion of legal uses of materials, to users who have print disabilities, to orphan works – Facilitate collaborative collection development and management operations – Will also benefit efforts in de-duplication Print Holdings Database • Volumes institutions own or have owned – Only print volumes (not microform, etc.) – OCLC number [required] – Bib record ID [required] – Condition (e.g., brittle) [optional] – Holding Status (e.g., current holding, withdrawn, missing, etc.) [optional] Percent Overlap Average = 37.4% Questions • Why not get the information from OCLC? • Is it necessary to declare all volumes held, or could an institution choose not to declare some? • Are the print holdings data currently provided by institutions taken as an indication of the volumes institutions are declaring they have access to? What are we doing currently? • Basing yearly fees on estimates – Based on infrastructure costs of anticipated content – Estimated partnership growth – Institution total volume counts SUNY Costs • SUNY University Centers – Albany, Binghamton, Buffalo, Stony Brook, Update and Downstate Medical Libraries – 11,049,952 volumes • All SUNY (based on 16,000,000 titles) – 27 institutions total – 20,800,000 volumes SUNY costs (2) • Estimate using – 9,500,000 volumes at end of 2011 – 60 partners (for University Centers and Medical libraries) – 87 partners (for all SUNY libraries) – Multiplier of 1.5 SUNY costs (3) • University Centers – Public Domain • Total PD cost * 1.5 / #partners * 6 = $70,903.22 – In Copyright • % of holdings (partner holdings / total holdings) * Total IC cost * 1.5 = $67,635.06 – Total = $138,538.28 • Prorated from August 1 = $58,072.21 SUNY costs (3) • All SUNY – Public Domain • Total PD cost * 1.5 / 87 * 27 = $220,044.49 – In Copyright • % holdings (partner holdings / total holdings) * Total IC cost * 1.5 = $127,198.61 – Total = $347,243.09 • Prorated from August 1 = $145,556.69 Sustaining v. Content-Contributing • Does not exclude contribution of content • If contribute content, costs covered up to amount that would be paid as Sustaining partner – Barring additional costs that might be needed to accommodate content (e.g., specialized load routines, generation of OCR) • Above that, pay per-GB cost ($3.40) Summary • Partners share in costs of sustaining common resource • Share in uses of relevant materials • Voice in future directions • Costs to institutions go down • Quality of services increases – Realize in aggregated collection, something don’t get through distributed search or federation • Free riders? Changing Library Landscape • Rapidly changing landscape • Libraries are making these decisions but they are more and more collective decisions • We cannot afford anymore to do work separately that could be done collaboratively HathiTrust overall benefits to libraries • Digital Curation – – – – – – Drive costs down Reduce “bibliographic indeterminacy” Make meaningful decisions about formats and quality Increase discoverability, use Consolidate development talent Improve strength of archiving • Print Curation – Means to associate our print holdings – Coordinated record-keeping • Subsidiary benefits – Quantify problems – Collective attention to solving shared problems How to find out more • Web site “About” section: http://www.hathitrust.org/about • Twitter: http://twitter.com/hathitrust • Monthly newsletter: http://www.hathitrust.org/updates • RSS: http://www.hathitrust.org/updates_rss • Contact us: feedback@issues.hathitrust.org Thank you very much!