The Million Book Project Global Cooperation for Global Access: Denise Troll Covey

advertisement
Global Cooperation for Global Access:
The Million Book Project
Denise Troll Covey
Principal Librarian for Special Projects
Carnegie Mellon
CRIS 2004 – Antwerp, Belgium
The Million Book Project
• Digitize & provide open access to a million books
• Vision, leadership, & research – Carnegie Mellon
• $$ Equipment & travel – NSF
• $$ Labor & research – India & China
“Attempt to understand & solve
the technical, economic, & social policy
issues of providing online access
to all creative works of the human race.”
Raj Reddy
National Surveys of Students & Faculty
• 90% want convenient, speedy, easy access
– The only thing they want more is quality information
• 61% want remote access to full-text e-resources
• Fewer than half think
the library meets these needs
• 48% start with Google
or other Internet search engine
Gloriana St. Clair
National Surveys of Undergraduates
• 96% believe surface web information is adequate
• 72% use an Internet search engine
• 48% believe library web site information is inferior
• 46% use online resources
all or most of the time
• Efficiency is more important
than relevance
Michael Shamos
Carnegie Mellon Graduate Students
• 82% start with an Internet search engine
• Getting information from the web is at least twice
as easy as getting it from library e-resources
• Using library e-resources is about as convenient as
getting information from professors or classmates
• 24% often can’t get information
when they need it
– Out of print books & old journals
Social Significance
• Help meet the need for convenient, speedy, easy,
remote access to quality academic resources
• Address disparity in library size & accessibility
• Democratize & facilitate new knowledge
• Support digital library research
• Preserve heritage
Collection of Collections
• What librarians select & partners want
– Books for College Libraries (BCL)
– Technical reports
– Cultural artifacts
Nov 2001 – NSF
Planning meeting
– Government documents
• What we can acquire
– Bulk, cheap, fast
Michael Lesk
Seeking Copyright Permission
for Open Access
Response rate Success rate
Success rate
per contacts
per responses per contacts
Random books
Posner fine books *
58%
76%
43%
70%
25%
53%
• Increased success: improved request letter, prompt follow
up, nature of collection, & ability to preview
• University presses, scholarly associations, & estates are
more likely than commercial presses to grant permission
• Transaction cost of $78 per volume is too expensive
Shift from Per Title to Per Publisher
Initial
In Copyright
Public Domain
Indigenous Materials
Current
Requires 18% success rate
with BCL publishers
& 500 books each
Copyright Permission Request Letter
• Educate
– Users want to find information online, but use print
– Online access increases use, even use of older works
– Open access does not decrease & can increase sales
– Currently no revenue
from out-of-print books
Request & Incentive
• Ask for non-exclusive permission to digitize
–
–
–
–
All out of print, in copyright titles
All titles published prior to a date of their choosing
All titles published # or more years ago
List of titles they provide
• Assure
– Following preservation standards & copyright law
– Print & save only one page at a time
• Give – images, metadata, & OCR  $$$$
Early – Preliminary – Statistics
Total
1. Owners contacted
2. Owners responded
3. Success - Responses
Success - Contacts
Million Book
Posner
Copyright Owners
Copyright Owners
206
107
100%
24%
57%
14%
65%
76%
70%
53%
Nov 2003 – Mar 2004
Many more follow up negotiations to be done
Don’t yet know number of titles
Success Rate Comparison
Based on
responses
Scholarly
associations
University
presses
Commercial
publishers
Authors/Estates
Other
100%
75%
50%
25%
0%
Random
books
Posner
books
Million
books
Digital Registry
• Registry of reproductions of books & journals
digitized or queued for digitization
– Reduce duplication
– More access for less cost
Release
May 31, 2004
• Registry signals
–
–
–
–
Intent to preserve & make accessible in entirety
Compliance with standards & best practices
Professionally managed storage & maintenance
Use copy available for public access
Acquisitions & Shipping
• Acquisitions
– Copyrighted books – OCLC locating in partner libraries
– Out of copyright – weeding; depositories; duplicates
• Lessons learned from pilot shipment to India
– Reduce cost to $1 per book round trip by changing packing
– Reduce time by distributed shipping & knowing customs
• Lessons learned working with China
– Customs & content issues initially prohibited shipping
– Scanning centers declared free enterprise zones 2004
Metadata & Digitization
• Following standards
• Operators scan & post-process
– Above average wages
– 4000 books per year per scanner (two shifts per day)
– 400,000 books per year with 100 scanners
• Librarians capture metadata
– Bibliographic: MARC or DC
– Administrative: copyright permission
& source library
Sustainability
• Following standards will enable migration
• Organizations committed to host the Collection
– Carnegie Mellon – University of California at Merced
– Internet Archive – DL of India
– Perhaps OCLC
– China
• Goal is to have ten mirror sites
– Estimated cost is one million dollars
– Estimated size is 20 terabytes
Brewster Kahle
Issues & Next Steps
• Adding value
– Negotiating with Amazon.com for print on demand
• Updating workflow & processing the backlog
• Coordinating acquisition & shipping
• Integrating the collection
• Improving the interface
• Copyright permission work
Thank you!
Denise Troll Covey – troll@andrew.cmu.edu
Download