Global Cooperation for Global Access: The Million Book Project Denise Troll Covey Principal Librarian for Special Projects Carnegie Mellon CRIS 2004 – Antwerp, Belgium The Million Book Project • Digitize & provide open access to a million books • Vision, leadership, & research – Carnegie Mellon • $$ Equipment & travel – NSF • $$ Labor & research – India & China “Attempt to understand & solve the technical, economic, & social policy issues of providing online access to all creative works of the human race.” Raj Reddy National Surveys of Students & Faculty • 90% want convenient, speedy, easy access – The only thing they want more is quality information • 61% want remote access to full-text e-resources • Fewer than half think the library meets these needs • 48% start with Google or other Internet search engine Gloriana St. Clair National Surveys of Undergraduates • 96% believe surface web information is adequate • 72% use an Internet search engine • 48% believe library web site information is inferior • 46% use online resources all or most of the time • Efficiency is more important than relevance Michael Shamos Carnegie Mellon Graduate Students • 82% start with an Internet search engine • Getting information from the web is at least twice as easy as getting it from library e-resources • Using library e-resources is about as convenient as getting information from professors or classmates • 24% often can’t get information when they need it – Out of print books & old journals Social Significance • Help meet the need for convenient, speedy, easy, remote access to quality academic resources • Address disparity in library size & accessibility • Democratize & facilitate new knowledge • Support digital library research • Preserve heritage Collection of Collections • What librarians select & partners want – Books for College Libraries (BCL) – Technical reports – Cultural artifacts Nov 2001 – NSF Planning meeting – Government documents • What we can acquire – Bulk, cheap, fast Michael Lesk Seeking Copyright Permission for Open Access Response rate Success rate Success rate per contacts per responses per contacts Random books Posner fine books * 58% 76% 43% 70% 25% 53% • Increased success: improved request letter, prompt follow up, nature of collection, & ability to preview • University presses, scholarly associations, & estates are more likely than commercial presses to grant permission • Transaction cost of $78 per volume is too expensive Shift from Per Title to Per Publisher Initial In Copyright Public Domain Indigenous Materials Current Requires 18% success rate with BCL publishers & 500 books each Copyright Permission Request Letter • Educate – Users want to find information online, but use print – Online access increases use, even use of older works – Open access does not decrease & can increase sales – Currently no revenue from out-of-print books Request & Incentive • Ask for non-exclusive permission to digitize – – – – All out of print, in copyright titles All titles published prior to a date of their choosing All titles published # or more years ago List of titles they provide • Assure – Following preservation standards & copyright law – Print & save only one page at a time • Give – images, metadata, & OCR $$$$ Early – Preliminary – Statistics Total 1. Owners contacted 2. Owners responded 3. Success - Responses Success - Contacts Million Book Posner Copyright Owners Copyright Owners 206 107 100% 24% 57% 14% 65% 76% 70% 53% Nov 2003 – Mar 2004 Many more follow up negotiations to be done Don’t yet know number of titles Success Rate Comparison Based on responses Scholarly associations University presses Commercial publishers Authors/Estates Other 100% 75% 50% 25% 0% Random books Posner books Million books Digital Registry • Registry of reproductions of books & journals digitized or queued for digitization – Reduce duplication – More access for less cost Release May 31, 2004 • Registry signals – – – – Intent to preserve & make accessible in entirety Compliance with standards & best practices Professionally managed storage & maintenance Use copy available for public access Acquisitions & Shipping • Acquisitions – Copyrighted books – OCLC locating in partner libraries – Out of copyright – weeding; depositories; duplicates • Lessons learned from pilot shipment to India – Reduce cost to $1 per book round trip by changing packing – Reduce time by distributed shipping & knowing customs • Lessons learned working with China – Customs & content issues initially prohibited shipping – Scanning centers declared free enterprise zones 2004 Metadata & Digitization • Following standards • Operators scan & post-process – Above average wages – 4000 books per year per scanner (two shifts per day) – 400,000 books per year with 100 scanners • Librarians capture metadata – Bibliographic: MARC or DC – Administrative: copyright permission & source library Sustainability • Following standards will enable migration • Organizations committed to host the Collection – Carnegie Mellon – University of California at Merced – Internet Archive – DL of India – Perhaps OCLC – China • Goal is to have ten mirror sites – Estimated cost is one million dollars – Estimated size is 20 terabytes Brewster Kahle Issues & Next Steps • Adding value – Negotiating with Amazon.com for print on demand • Updating workflow & processing the backlog • Coordinating acquisition & shipping • Integrating the collection • Improving the interface • Copyright permission work Thank you! Denise Troll Covey – troll@andrew.cmu.edu