HATHITRUST A Shared Digital Repository HathiTrust: On TRAC ICPSR Applied Data Science Repository Requirements and Assessment: HathiTrust July 26, 2012 Jeremy York, Project Librarian, HathiTrust Partnership Arizona State University Baylor University Boston College Boston University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Florida State University Getty Research Institute Harvard University Library Indiana University Johns Hopkins University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University` Michigan State University New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of Arizona University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska-Lincoln The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of WisconsinMadison Utah State University Washington University Yale University Library Digital Repository • Launched 2008 • Initial focus on digitized book and journal content – 10.4 million volumes – 5.5 book titles – 270,000+ serial titles – 3.1 public domain volumes (~30%) Mission • To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge HathiTrust Universal Library Common Goal Single Entity, Many Partners Collections and Collaboration • Comprehensive collection - Preservation…with Access • Shared strategies • Public Good Services • Long-term preservation – Bit-level and migration • • • • • • Bibliographic search Full-text search Reading and download capabilities Print on demand Collections Datasets, Research Center Governance • 12-member Board of Governors – April 2012 • Manages budget and finances • Budget separately held within the University of Michigan • Strategic Advisory Board • Working Groups and Committees CRL Audit • Why – Value Community Standards – Accountability, Openness, Transparency • Desire to know how we were doing, and let the community know What is TRAC • Trusted Digital Repositories (OCLC, RLG) 2002 – A framework of attributes and responsibilities – One of recommend items was process for certifying digital repositories • TRAC (RLG, NARA) 2007 – CRL, nestor, DCC, National Library of Australia • Administered by CRL in US CRL Audit (2) • Guided by criteria included in TRAC, as well as other metrics developed by CRL • HathiTrust’s practices are sound…appropriate to the content being archived and the general needs of the CRL community. What was involved? • Timeline – Data gathering: November 2009 - December 2010 – Site visit May 2010 • Logistics – Question by email, documentation – Phone conversations – Staff: Project Librarian, Digital Preservation Library, Executive Director Where we were • Developmental stages – Changing, growing • Core pieces in place Results • Organizational Infrastructure (2) – Mission statement, succession plan, staff, assessment, accountability, business plan, agreements • Digital Object Management (3) – Properties preserved, SIP, AIP, validation, naming conventions, identifiers, understandability, preservation strategies, logging, access policies • Technologies Technical Infrastructure Security (4) – Hardware, software, error-handling, change management, security, staff roles, disaster preparedness Key Issues • Staff/Organization • Rights and ownership of HathiTrust enterprise assets • Succession plan • Clarify and strengthen quality assurance and print archiving components of HathiTrust program Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning Collective Work: Working Groups and Committees Strategic • Collections • Discovery Interface • Full-text Search Operational Operational Communications •• Communications UserSupport Support •• User UserExperience Experience •• User Distributed work • Driven by needs of institutions • Leverage across the partnership • Projects, Grant Work, Ingest Specifications, PageTurner, Bibliographic Data Management HathiTrust Governance Budget, Finances Decision-making Policy Enterprise Management Repository Administration Repository Administration Communication and Coordination with partner institutions Hardware configuration and maintenance Data management (content storage, backup, integrity checks, deletion) Project management Planning Web and application server configuration and maintenance Security Hardware selection and replacement Content and Metadata specifications Permissions Rights Management Bibliographic Data Management Copyright determination Entity description (record-level) Copyright review Object identification (item-level) Copyright information management (database) Data availability Collection Development Digital • Expansion beyond books and journals (born-digital, images and maps, audio) • Selection of content (for nonGoogle volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Rightsholder permissions Disaster Recovery Logging Processes for ensuring content integrity e-Commerce Print on Demand Content Ingest Content Access Quality Assurance User Services Transformation PageTurner Quality Review Usability Validation Collection Builder Content Certification User support (helpdesk) Large-scale Search Financial contributions of partners Research Center Bibliographic Catalog APIs HathiTrust Functional Framework Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Key Issues (2) • Rights and ownership of HathiTrust enterprise assets • Succession plan • Clarify and strengthen quality assurance and print archiving components of HathiTrust program Future Work • Disaster Recovery • Change Management – Moving to new formats: image, audio, born-digital • Governance • Certification updates Thank you very much!