HATHITRUST A Shared Digital Repository Sharing Collections through Shared Stewardship A HathiTrust Progress Report Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License. Northwestern University October 21, 2014 Mike Furlough Executive Director, HathiTrust Today’s Conversation • Do you really know HathiTrust? – How things work – Collections and data • What are we working on now? • How has the world changed since we began? – And what does that mean for HathiTrust 21 October 2014 2 The Partnership Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Efforts include, but are not limited to …building comprehensive collections co-owned and managed by partners. …enabling access by users with print disabilities. …supporting computational research with the collections. …stimulating shared collection storage strategies among libraries. 21 October 2014 4 Timeline: Highlights • • • • • • • • Google Library Project announced (2004) Launch (2008) TRAC certification (2011) Constitutional convention (2011) 10 million volumes (2012) New governance established (2012) Current bylaws and fee structure (2013) 12 million volumes (2014) 21 October 2014 5 HathiTrust Members Allegheny College Arizona State University Baylor University Boston College Boston University Brandeis University Brown University California Digital Library Carnegie Mellon University Colby College Columbia University Cornell University Dartmouth College Duke University Emory University Florida State University Getty Research Institute Harvard University Library Indiana University Iowa State University Johns Hopkins University Kansas State University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University` Michigan State University Montana State University Mount Holyoke College New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University 21 October 2014 6 The Ohio State University The Pennsylvania State University Princeton University Purdue University Rutgers University Stanford University Syracuse University Temple University Texas A&M University Texas Tech Tufts University Universidad Complutense de Madrid University of Alabama University of Alberta University of Arizona University of British Columbia University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Delaware University of Florida University of Houston University of Illinois University of Illinois at Chicago The University of Iowa University of Kansas University of Maine University of Maryland University of Massachusetts, Amherst University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska-Lincoln University of New Mexico The University of North Carolina at Chapel Hill University of Notre Dame University of Oklahoma University of Pennsylvania University of Pittsburgh University of Queensland University of Tennessee, Knoxville University of Texas University of Utah University of Vermont University of Virginia University of Washington University of WisconsinMadison Utah State University Vanderbilt University Virginia Tech Wake Forest University Washington University Yale University Library Growth in Membership 101 91 77 64 49 25 13 2008 21 October 2014 2009 7 2010 2011 2012 2013 2014 Shared Responsibilities • Leverage expertise across institutions – Collective work • Distributed Infrastructure – Preservation repository and access services • University of Michigan • Mirror site: Indiana University – Metadata management services (Zephir) • California Digital Library – HathiTrust Research Center • Indiana University and University of Illinois 21 October 2014 8 Governance Board of Governors Program Steering Committee HathiTrust Members Committees and Working Groups 21 October 2014 9 Operations Executive Director Five-year terms (beginning Jan 2013) Betsy Wilson (University of Washington) Bob Wolven (University of Columbia) Four year terms Richard Clement (University of New Mexico) Patricia Steele (University of Maryland) Three year terms: Carol Mandel (New York University) Sarah Michalak (University of North Carolina-Chapel Hill) Members appointed by the founding institutions: James Hilton (University of Michigan) Carol Diedrichs (Ohio State University) Laine Farley (California Digital Library) Wendy Lougee (University of Minnesota) Brian Schottlaender (UC, San Diego) Brenda Johnson (Indiana University) 21 October 2014 10 Ex Officio (Board, PSC, Executive Committee): Mike Furlough, Executive Director Executive Committee - Chair - Past Chair - Treasurer - Chair of PSC - Executive Director HathiTrust Board of Governors Program Steering Committee • Serves at the direction of the Board of Governors to… – Reviews HathiTrust’s development agenda, shaping initiatives and strategies for Board discussion and decision-making, and considering the implications of those initiatives for the future. – Recommends alterations in the development agenda based on such reviews. Based on its reviews, develops position papers for the member community to encourage debate or mobilize discussion with regard to particular issues. – Works with the Board of Governors to develop policies for HathiTrust and its members. 21 October 2014 11 Program Steering Committee Membership • Ivy Anderson (CDL) • John Butler (Minnesota) • Chris Freeland (Washington University) • Todd Grappone (UCLA) • Martha Hruska (UC San Diego) • Martin Kurth (New York University) 21 October 2014 12 • Erika Linke (Carnegie Mellon University) • Robert McDonald (Indiana) • Matthew Sheehy (Harvard) • Elaine Westbrooks (Michigan) • Bob Wolven, Chair (Columbia) PSC Actions 2013-2014 • Initial focus on Constitutional Convention Ballot Initiatives • Re-established Collections Committee • Created Rights & Access Working Group • Charged Zephir Advisory Group 21 October 2014 13 Standing Committees and Working Groups • Collections Committee • Rights and Access Working Group • User Support Working Group Collections and Access Growth of Collection 14,000,000 12,104,793 12,000,000 9,966,572 10,599,355 10,878,121 10,000,000 7,836,698 8,000,000 5,221,092 6,000,000 4,000,000 2,477,871 2,000,000 0 2008 21 October 2014 16 2009 2010 2011 2012 2013 2014 Preservation with Access • Preservation – TRAC-certified – Long-term commitments on digital content facilitate planning, decision-making • Discovery – Bibliographic and full-text search of all materials – Mechanisms for local loading of records • Access and Use – – – – 21 October 2014 Full text search (all users) Public domain and open access works (all users) Collections and APIs (all users) Lawful uses of in-copyright works (members) 17 Content Sources University of Virginia, 0.46% Utah State University, University of North 0.00% Purdue Keio University, 0.73% Carolina at Chapel Hill, University, 0.16% 0.41% Universidad Columbia Texas A&M University, Complutense, University1.02% of University, 0.01% Minnesota, 1.08% 0.59% Library of Congress, Penn Indiana University, 1.78% 0.82% Harvard State, University, 0.63% Princeton University of 2.16% University, 2.29% Illinois, 1.05% New York Public Library, 2.63% Boston College, 0.02% North Carolina State University, 0.03% University of Florida, 0.09% Yale University, 0.22% Duke University, 0.25% Cornell University, 4.02% University of Michigan, 42.52% University of Wisconsin, 5.06% University of California, 31.47% 21 October 2014 18 University of Chicago, 0.36% Northwestern University, 0.34% Ohio State, 0.01% Dates 0-1500, 0.04% 1500-1599, 0.07% 1600-1699, 0.01% 2000-2009 1700-1799, 0.01% 10% 1850-1899 1800-1849 3% 1910-1919 1900-1909 10% 4% 4% 1920-1929 4% 1930-1939 4% 1940-1949 4% 1960-1969 11% 1990-1999 14% 1980-1989 14% 1970-1979 13% 1950-1959 6% * As of February 17, 2014 21 October 2014 19 Language Distribution (1) Latin, 1% Remaining Languages, 13% The top 10 languages make up ~87% of all content Arabic, 2% Italian, 3% Japanese, 3% English, 49% Russian, 4% Chinese, 4% Spanish, 5% German, 9% French, 7% * As of February 17, 2014 21 October 2014 20 Language Distribution (2) The next 40 languages make up ~12% of total Slovak, 1% Turkish,-Ottoman, 1% Malayalam, 1% Finnish, 1% Romanian, 1% Malay, Slovenian, 1% Telugu, 1% 1% Greek,MultipleArmenian, 1% Yiddish, 1% Ancient-(tolanguages Panjabi, 1% 1453), 1%Bulgarian Nepali, 0% , 1% , 1% Serbian, 1% Marathi, 1% Vietnames Catalan, 1% e, 1% Ukrainian, 1% Polish, 7% Greek,-Modern(1453--), 2% Sanskrit, 2% Norwegian, 2% Portuguese, 7% Dutch, 5% Hebrew, 5% Hindi, 5% Bengali, 2% Hungarian, 2% Tamil, 2% Persian, 2% Indonesian, 4% Croatian, 3% Czech, 3% 21 October 2014 21 Korean, 4% Danish, 3% Turkish, 3% Urdu, 3% Thai, 3% Swedish, 4% * As of February 17, 2014 Access: Lawful uses of in-copyright works • Sensitive to multiple legal regimes – Full-text search (everyone everywhere) – Access to users who have print disabilities (through member proxy in US, and where law permits)** – Access works that are damaged or missing and also out of print and unavailable (members in US only) **Terms and conditions at http://www.hathitrust.org/access_use#ic-access 21 October 2014 22 Collective Action: Copyright Review • Copyright Review Management System – CRMS US: Works published in US, 1923-1963 – CRMS-World: Published non-US (UK, Canada, Australia, Spain) – Through both projects over 450,000 items reviewed – 52% determined to have some public domain status 21 October 2014 23 Copyright Distribution US Gov't 5% Public Domain (US) 12% In Copyright 65% Open 35% Public Domain World 18% Open Access < 0.01% Creative Commons <0.01% 21 October 2014 24 Special Collections: Islamic Manuscripts 21 October 2014 25 Current Initiatives Current Initiatives 1. Developing a shared print monographs archive 2. Expanding coverage and access to US government publications 3. Expanding support for computational (nonconsumptive) research 21 October 2014 27 Shared Print Monographs Archive • Ballot Initiative passed at the 2011 HT Constitutional Convention (Con-Con) – “To develop a print monographs archive corresponding to volumes represented within the HathiTrust” • HathiTrust Board of Governors recently approved appointment of a PSC-designed task force to begin process 21 October 2014 28 Photo by Mal BooTH CC-BY-NC-ND https://www.flickr.com/photos/malbooth/5100435988 Why A Shared Print Archive Program • Many regional efforts, but limited national/international coordination • Strengthens preservation commitments – Connects both print and digital preservation • Significant need and desire to reduce costs of collection management and associated footprint 21 October 2014 29 Print Monographs Archive Task Force • • • • • • Tom Teper, Chair (University of Illinois) Clem Guthro (Colby College) Robert Kieft (Occidental College) Erik Mitchell (University of California, Berkeley) Jake Nadal (ReCAP) Jo Anne Newyear Ramirez (University of British Columbia) • Matthew Sheehey (Harvard University) • Emily Stambaugh (California Digital Library) • Karla Strieb (Ohio State University) 21 October 2014 30 Questions: Monographs Archive • What incentives will encourage participation? • What services and access models are needed? • What structures are needed to establish commitments from multiple programs? • What costs will be associated with the program and how should they be allocated? • Which libraries are most likely to benefit? 21 October 2014 31 Government Documents Initiative • Ballot Initiative: provide “expanded coverage & enhanced access to U.S. Government Documents.” • Activities: – Developing a registry of US Federal Government Documents – HathiTrust Board of Governors recently approved appointment of a PSC-designed Advisory Group to begin process 21 October 2014 32 Photo detail from http://babel.hathitrust.org/cgi/pt?id=mdp.39015087610286;view=1up;seq=14 Existing Government Documents • 568,219 documents as of September 2014 • 441,096 from the CIC, working in conjunction with Google • CIC libraries and the University of California will continue to supply 21 October 2014 33 Working Group Membership • • • • • • • • • • • • Mark Sandler (CIC) Prue Adler (Association of Research Libraries) Ivy Anderson (California Digital Library) Joni Blake (Greater Western Library Alliance) Kirsten Clark (Minnesota) Rick Clement (New Mexico) Elizabeth Cowell (Santa Cruz) Mark Phillips (North Texas) Jon Rothman (Michigan) Judy Russell (Florida) Barbie Selby (Virginia) Jeremy York (HathiTrust) 21 October 2014 34 Working Group Tasks • Develop an overall strategy for building a comprehensive public domain corpus of U.S. federal documents in HathiTrust • Recommend investments as needed to pursue the initiative • Advise the Board on relevant policy issues • Provide oversight and guidance as the project develops 21 October 2014 35 The Registry • Goal: “….include metadata for the comprehensive corpus of U.S. federal documents. This will include materials produced at U.S. government expense, in all formats, at the item level, from 1789 to the present. 21 October 2014 36 Computational Access • HathiTrust distributes public domain datasets • HathiTrust Research Center – Developed collaboratively by Indiana University and University of Illinois; launched July 2011 – Funding from the Sloan Foundation, Andrew W. Mellon Foundation, and NEH Office of Digital Humanities. – Partially Funded by HathiTrust (2014-2018) 21 October 2014 37 Photo by Nocolas Nova CC-BY-NC https://www.flickr.com/photos/nnova/3455992927 Mission of the HT Research Center • Mission: Enable researchers world-wide to accomplish tera-scale text data-mining and analysis – Develop cyberinfrastructure to enable HPC access to the HathiTrust Digital Library – Develop cutting-edge software tools for processing, analyzing text – Develop translational tools and data that can be used to enhance HathiTrust Digital Library services to users 21 October 2014 38 HTRC Governance • Reports to the HathiTrust Board of Governors – Advisory Group in formation • HTRC Executive Committee – J. Stephen Downie (Co-director), Professor and Associate Dean for Research, University of Illinois GSLIS – Beth Plale (Co-director and Chair), Director Data To Insight Center and professor in the School of Informatics and Computing at Indiana University – Robert H. McDonald, Associate Dean of Libraries/Deputy Director Data to Insight Center at Indiana University – Beth Sandore Namachchivaya, Associate University Librarian for Information Technology Planning & Policy at the University of Illinois – John Unsworth, Vice Provost for Library & Technology Services and Chief Information Officer at Brandeis University 21 October 2014 39 Goals for HTRC • Provide a persistent and sustainable structure to enable original and cutting edge research. – Leverage data storage and computational infrastructure at Indiana & Illinois – Stimulate community development of new functionality and tools – Use tools to enable discoveries that would not be possible without the HTRC • Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law. – Provision secure computational and data environment for scholars to perform research using HathiTrust Digital Library. 21 October 2014 40 HathiTrust Research Center UnCamp #3 March 30-31, 2015 University of Michigan Ann Arbor, MI 21 October 2014 41 Some Thoughts on the Present and Future How are we positioned? • Our mission, collection, and the repository operations are all strong. • Our brand reputation is outstanding. • Our work is solidly supported by the law. • We have expanded access in unprecedented ways. • The partnership provides a solid base for action. • We have very important programs underway. 21 October 2014 43 What needs thought? • Strategy, mission, and role in the future – – – – – Membership growth Collections program Public policy (Inter)National digital infrastructure Services for members and the public • Organizational – Engagement with researchers and libraries – Enabling more participation in plans and action – Standing on our own 21 October 2014 44 Assumptions • Our actions must align with the mission, goals, and purpose across our partnership. • A few additional assumptions – We should pursue complementarity and cooperation, not competition and duplication. – Scale will continue to drive our strategies – Potential partners are not just other libraries and library organizations, but also readers, authors, publishers. 21 October 2014 45 How to find out more • • • • • About: http://www.hathitrust.org/about Resources: http://www.hathitrust.org/resources Twitter: http://twitter.com/hathitrust Facebook: http://www.facebook.com/hathitrust Monthly newsletter: – http:www.hathitrust.org/updates – RSS http://www.hathitrust.org/updates_rss • Contact us: feedback@issues.hathitrust.org • Blogs: http://www.hathitrust.org/blogs – Large-scale Search – Perspectives from HathiTrust 21 October 2014 46 Thank you! furlough@hathitrust.org @MikeFurlough