HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson Overview • • • • • About HathiTrust as an organization Characteristics of the HathiTrust collection Collection development & management Changing library landscape How print management fits in HathiTrust is a… • Partnership of 50+ research libraries, founded in October 2008 • Distributed, collaborative enterprise • Trusted digital preservation repository • Digital library • Service provider • Platform to collectively explore new library models Goals • Comprehensive collection • Preservation…with Access • Shared strategies – – – – Collection management, development Preservation Copyright review Efficient user services • Openness Mission and Goals Governance Budget, Finances Decision-making Policy Enterprise Management Repository Administration Repository Administration Communication and Coordination with partner institutions Hardware configuration and maintenance Data management (content storage, backup, integrity checks, deletion) Project management Planning Web and application server configuration and maintenance Security Hardware selection and replacement Content and Metadata specifications Permissions Rights Management Bibliographic Data Management Copyright determination Entity description (record-level) Copyright review Object identification (item-level) Copyright information management (database) Data availability Collection Development Digital • Expansion beyond books and journals (born-digital, images and maps, audio) • Selection of content (for nonGoogle volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Rightsholder permissions Disaster Recovery Logging Processes for ensuring content integrity e-Commerce Print on Demand Content Ingest Content Access Quality Assurance User Services Transformation PageTurner Quality Review Usability Validation Collection Builder Content Certification User support (helpdesk) Large-scale Search Financial contributions of partners Research Center Bibliographic Catalog APIs Outreach Project website Monthly newsletter Papers and presentations HathiTrust Functional Framework Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Partners involved at many levels Governance by partners •Executive Committee •Strategic Advisory Board In-kind contribution from partners •Working Groups •Operations and development Funding from partners Working Groups • Appointed by Strategic Advisory Board and Executive Committee • Both operational and strategically-focused groups • Collections, Communications, Discovery Interface, Full-text Search, Usability, User Support • Now 40+ people across the country • Expertise from across the partnership Cost Models # 1: “Contributor” model • Partners pay per volume deposited • Economy of scale keep costs low #2: “Sustaining” model, holdings-based • Partners share in costs of curating aggregate collection • Share in uses of relevant materials • Voice in future directions • Sustaining common resource • Quality of services increases • Costs go down HathiTrust Content • HathiTrust content has been curated over time by librarians – Mirrors collections of large research libraries – Focus on quality • Expanding Non-Google content – Public Domain: Copyright Review Management System – Content from non-Google sources • Internet Archive, image collections, government publications Content Sources * As of May 1, 2011 Content over time 100% Chicago 90% Madrid 80% Columbia 70% LoC Harvard 60% Minnesota 50% Indiana 40% Princeton NYPL 30% Cornell 20% Wisconsin 10% California 0% Michigan * As of May 1, 2011 HathiTrust Content Growth Content Distribution 8,234,081 – Total volumes 2,102,033 – Public Domain 4,527,381 Book titles 202,649 Serial titles * As of March 5, 2011 Language Distribution (1) The top 10 languages make up ~86% of all content * As of May 1, 2011 Statistics and Visualizations Language Distribution (2) The next 40 languages make up ~13% of total * As of May 1, 2011 Statistics and Visualizations Dates * As of May 1, 2011 Statistics and Visualizations Collection Development and Management Collections Committee • Prioritization of collection development activities • Appropriate principles for duplicate volumes • Process for decision-making and prioritization for new content types • Recommendations for tools and services • Prioritization of copyright review and rightsclearing processes • Print management proposal A global change in the library environment 60% Academic print book collection already substantially duplicated in mass digitized book corpus 50% % of Titles in Local Collection June 2010 Median duplication: 31% 40% 30% 20% June 2009 Median duplication: 19% 10% 0% 0 20 40 60 80 Rank in 2008 ARL Investment Index 100 120 Continuing growth of overlap … • ARL overlap – 31% in June 2010 – 45% by December, 2010 • Oberlin Group overlap – 41% in December, 2010 – Higher rate of overlap per added volume? – Close to 50% in May, 2011 And yet every library is different • • • • • • Our median rate of overlap may be the same But our overlap profiles will differ by library Our use patterns differ Our risk profiles differ Our roles vis-à-vis our constituencies differ Thus, the need to act independently on common data Extending the holdings database • HathiTrust print holdings database – Basis for new cost model (overlap of in-copyright) – Basis for lawful uses (e.g., print disabilities, Section 108) – A more complete picture than elsewhere • Print monograph storage proposal – Enable partners to register commitments – Establish definitions (e.g., environment, use and condition) – Build in cost-sharing: collectively fund those that make commitments – Communicate information to partners to facilitate decision-making Next steps? • Work to develop draft proposal underway by HathiTrust Collections Committee – Focus on monographs – Dovetail with other efforts • Consideration (as part of new cost model), at Constitutional Convention in Oct. 2011 Changing Library Landscape • Rapidly changing landscape • Libraries are making these decisions but they are more and more collective decisions • We cannot afford anymore to do work separately that could be done collaboratively How to find out more • Web site “About” section: http://www.hathitrust.org/about • Twitter: http://twitter.com/hathitrust • Monthly newsletter: http://www.hathitrust.org/updates • RSS: http://www.hathitrust.org/updates_rss • Contact us: feedback@info.hathitrust.org Thank you! Heather Christenson University of California, California Digital Library heather.christenson@ucop.edu