HATHITRUST A Shared Digital Repository Collaborating Globally, Planning Locally HathiTrust and New Opportunities in Collection Management GWLA/UNM: Emerging Collection Management Opportunities September 26, 2011 Jeremy York Project Librarian, HathiTrust Partnership Arizona State University Baylor University Boston University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Getty Research Institute Harvard University Library Indiana University Johns Hopkins University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University Michigan State University New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska-Lincoln The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of WisconsinMadison Utah State University Yale University Library The Name • The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy Mission • To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge Collections and Collaboration • Comprehensive collection - Preservation…with Access • Shared strategies – – – – – – Collection management, development Copyright Preservation (digital and print) Bibliographic Indeterminacy Discovery / Use Efficient user services • Public Good Governance Budget/Finances Decision-making Strategic Advisory Board Executive Committee HathiTrust Guidance on Policy, Planning How does work get done? • Collective work – e.g., working groups – Perform the work of the partnership – Now 40+ people across partner institutions • Distributed work – Driven by needs of institutions – able to leverage across the partnership – Projects, e.g. grant work, ingest specifications, page-turner, bibliographic data management • Leverage expertise across institutions How is work prioritized? • Initial functional objectives • Collective processes – Working groups and committees • Constitutional Convention – Ballot Proposals Partnership • Who can become a partner? – Institutions worldwide – Libraries with print holdings What are the benefits? (1) • Cost-effective long-term preservation and access services for digitized content – Commitments on digital content facilitate decisions about digitization efforts and print collection management • For those with content, immediately offering long-term preservation, bibliographic and full-text search, collection-building • With content or not, full viewing and downloading capabilities for public domain materials and materials for which we have received permissions What are the benefits? (2) • Specialized access to public domain and in-copyright materials for users with print disabilities • Other lawful uses of in copyright materials such as Section 108 uses (print replacement copies, digital access to applicable works), orphan works • HathiTrust encourages participation in initiatives and resources geared toward – Shared collection development and management (e.g., copyright review work, print holdings database, de-duplication, collaboration with other organizations and initiatives) – Participation in governance and collaborative initiatives – Defining directions for the shared library. What’s involved? • Contract – Sustaining – Content-Contributing • Yearly fees • Commitment – 5-year periods • Shibboleth • Print Holdings Consortial Membership? • No pricing benefit for consortia • Benefits from where consortia can offer services to reduce costs for members – Coordinating ingest, print holdings, other? Collections: what we have Content Distribution 27% In Copyright 73% Public Domain 9,706,923 Total volumes 2,636,483 “Public domain” 5,153,036 Book titles 255,907 Serial titles * As of September 24, 2011 Content Distribution US Gov Docs 3% In Copyright 73% Public Domain (US) 10% "Public Domain" 27% Public Domain (worldwide) 14% Open Creative Access Commons .1% .01% * As of September 24, 2011 Dates 1700-1799 1600-1699 1900-1909 1800-1849 1% 0% 4% 3% 1910-1919 4% 1850-1899 7% 1500-1599 0% 0-1500 0% 1990-1999 14% 1920-1929 4% 1930-1939 4% 1940-1949 4% 1950-1959 6% 2000-2009 10% 1960-1969 11% 1980-1989 15% 1970-1979 13% * As of September 24, 2011 Breakdown of HathiTrust book corpus by publication date Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building – 2/2011 Breakdown of HathiTrust book corpus by publication date Language Distribution (1) Remaining Arabic Languages 2% Italian Latin 14% 3% 1% Japanese 3% Russian 4% Chinese 4% Spanish 5% French 7% The top 10 languages make up ~86% of all content English 48% German 9% * As of September 24, 2011 Language Distribution (2) Romanian AncientMultiple Malayalam Sanskrit Greek Slovak Malay 1% Greek 1% 2% 1% 1% Bulgarian The next 40 1% 1% 1% Catalan Portuguese 1% languages make 1% 6% Bengali Ukrainian Armenian Marathi Panjabi Finnish Serbian Slovenian 1% up ~13% of total 1% 2% 1% 1% 1% 1% 1% Vietnamese Undetermined Polish 2% 6% 7% Norwegian Dutch 2% 5% Hungarian 2% Music 2% Hebrew Tamil Hindi 5% 2% 5% Persian 2% Indonesian Korean Unknown 5% Croatian Czech 4% 3% Thai Urdu Turkish Swedish 3% 3% 3% 3% 3% Danish 4% 3% * As of September 24, 2011 Content over time 100% 90% Virginia Madrid 80% Columbia 70% LoC 60% Harvard Minnesota 50% Indiana 40% Princeton 30% 20% NYPL Cornell Wisconsin 10% California 0% Michigan * As of September 24, 2011 HathiTrust Content Growth Collaboration: Collection Management The Cloud Library • Toward a Cloud Library – CLIR, Mellon Foundation – OCLC Research, NYU, HathiTrust, Recap Libraries • Objective: Characterize the near-term opportunity for externalizing management of academic research collections leveraging capacity of large-scale shared print and digital repositories • Outcomes: opportunity and risk assessment based on aggregate collection analysis; draft service agreement enabling generic consumer library to selectively outsource preservation and access of low-use research collections to large-scale print and digital repositories (Malpas, RLG Partner Update, January 2010) A global change in the library environment 60% Academic print book collection already substantially duplicated in mass digitized book corpus 50% % of Titles in Local Collection June 2010 Median duplication: 31% 40% 30% 20% June 2009 Median duplication: 19% 10% 0% 0 20 40 60 80 Rank in 2008 ARL Investment Index 100 120 Continuing growth of overlap … • ARL overlap – 31% in June 2010 – 33% in Dec (adjustment: adding little-held works) – ~ 1% per 225,000 vols – 45% by December, 2011 • Oberlin Group overlap – Close to 9% points higher – 41% in December, 2010 – Close to 50% in May, 2011 – Higher rate of overlap per added volume? Digitized Books in Shared Repositories ~3.5M titles 3,500,000 3,000,000 ~75% of mass digitized corpus is ‘backed up’ in one or more shared print repositories ~2.5M Unique Titles 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 Sep-09 Oct-09 Nov-09 Dec-09 Mass digitized books in Hathi digital repository Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Mass digitized books in shared print repositories New Cost Model • Original model based on GB contributed • New model based on overlap of print collections with HathiTrust digital collections • Supported by print holdings database • Database will – Support expansion of legal uses of materials: preservation uses, access for users who have print disabilities, access to orphan works – Facilitate individual and collaborative collection development and management operations – Will also benefit efforts in de-duplication Print Holdings Database • Volumes institutions own or have owned – For monographic holdings – Only print volumes (not microform, etc.) – OCLC number [required] – Bib record ID [required] – Enumeration/chronology, if available – Condition (e.g., brittle) [optional] – Holding Status (e.g., current holding, withdrawn, missing, etc.) [optional] – For serial holdings - OCLC number [required] - Bib record ID [required] - ISSN, if available Every library is different • Our median rate of overlap may be the same • But our overlap profiles will differ by library Every library is different • • • • • • Our median rate of overlap may be the same But our overlap profiles will differ by library Our use patterns differ Our risk profiles differ Our roles vis-à-vis our constituencies differ Thus, the need to act independently on common data Cooperative Print Monograph Archive • Print monograph storage proposal – Enable partners to register commitments – Establish definitions (e.g., environment, use and condition) – Build in cost-sharing: collectively fund those that make commitments – Communicate information to partners to facilitate decision-making http://www.hathitrust.org/constitutional_convention2011 Quality • IMLS grant led by Paul Conway • Metrics and measures of quality • Certification Quality Changing Library Landscape Print Monograph Archive Proposal (HathiTrust Collections Committee): • “…the potential for ubiquitous information access…challenges the very foundation underlying the development of vast collections of printed literature in our nation’s libraries.” • “This model for collection development and access…is today becoming less and less relevant to our core mission” • “From its inception, HathiTrust has aspired to reshape the landscape of research libraries. This landscape includes the management of vast, highly-redundant collections of printed resources for which readily accessible digital instantiations are increasingly available.” • “With the advent of HathiTrust…the opportunity exists for our institutions to not only work together to profoundly influence the landscape in which we provide access to cultural resources but to profoundly influence the mechanisms by which we ensure the persistence of the printed record” Thank you! How to find out more • Web site “About” section: http://www.hathitrust.org/about • Twitter: http://twitter.com/hathitrust • Monthly newsletter: http://www.hathitrust.org/updates • RSS: http://www.hathitrust.org/updates_rss • Contact us: feedback@issues.hathitrust.org