What is HathiTrust and How Can it Make a Difference? ‘Sourcing and Scaling’ brought to the collective collection What is HathiTrust? HathiTrust is attempting nothing short of creating a comprehensive repository of published literature, primarily though not exclusively through digitization. Language Distribution (1) * All data was current as of March 20 Language Distribution (2) The next 40 languages make up ~13% of total Publication Date Originating Institution Content over time HathiTrust is about collections, writ large, and not about Google digitization. The first order of HathiTrust business is long-term preservation of this digital content, and we don’t believe in preservation without access. HathiTrust takes the business of sustainability seriously, with regard to governance, finances and technology. The HathiTrust Business Model, v.1: Depositor Pays all of the reasonable costs of sustaining the archive— including replacement costs and a sort of insurance policy— are combined to create a sort of atomic cost unit (in this case, a GB of content) What are “all of the reasonable costs of sustaining the archive”? How much does it cost? Governance Budget, Finances Decision-making Policy Enterprise Management Repository Administration Repository Administration Communication and Coordination with partner institutions Hardware configuration and maintenance Data management (content storage, backup, integrity checks, deletion) Project management Planning Web and application server configuration and maintenance Security Hardware selection and replacement Content and Metadata specifications Permissions Rights Management Bibliographic Data Management Copyright determination Entity description (record-level) Copyright review Object identification (item-level) Copyright information management (database) Data availability Collection Development Digital • Expansion beyond books and journals (born-digital, images and maps, audio) • Selection of content (for nonGoogle volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Rightsholder permissions Disaster Recovery Logging Processes for ensuring content integrity e-Commerce Print on Demand Content Ingest Content Access Quality Assurance User Services Transformation PageTurner Quality Review Usability Validation Collection Builder Content Certification User support (helpdesk) Large-scale Search Financial contributions of partners Research Center Bibliographic Catalog APIs Outreach Project website Monthly newsletter Papers and presentations HathiTrust Functional Framework Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy What is the cost of this atomic unit, the GB, and how does that relate to the content we’re storing? What are replacement costs and an insurance policy? High cost is off-set by large benefit Collective collections have their benefits The HathiTrust Business Model, v.2: Costs based on holdings overlap and the perceived benefits we derive an ARL institution that wishes to use HathiTrust as part of a larger strategy— part of a “cloud” strategy new cost model: http://www.hathitrust.org/cost For public domain volumes: (PD*X*C)/N For a given incopyright volume: IC=(C*X)/H sharing in the curation; having a voice in shaping the future But how can HathiTrust make a difference??? (x3) driving down costs reducing bibliographic indeterminacy making meaningful decisions about formats and quality Collective digital curation increasing discoverability consolidating development talent improving strength of archiving a means by which to associate all of our holdings of print Collective print curation perform record-keeping in a coordinated way improve description Subsidiary benefits quantify problems exerting collective attention to solving problems scale! “transfer resource[s] away from 'infrastructure' and towards user engagement.” Lorcan Dempsey