PowerPoint Presentation - Mass Digitization

advertisement
Massively Digitizing
UC Library Collections
Google, Microsoft, and More
Learning in Retirement
Libraries – The Intersection of Tradition and
Innovation
April 10, 2008
Ivy Anderson & Heather Christenson
California Digital Library
“11th University Library”
founded 1997

Two Complementary Roles


Part of UC Office of the
President

Three Audiences




Facilitate library collaboration across the ten
campuses of the UC system (e.g. shared collection
development)
Distinctive services emphasizing digital
stewardship, innovation in scholarly publishing,
and open-access digital collections
UC libraries
Broader UC community
External constituencies and the general public
Five Programs





Collection Development and Management
(Licensed Content, Shared Print Collections, Mass
Digitization)
Bibliographic Services (Melvyl Catalog, SFX)
Preservation (Digital Preservation Repository, Web
Archiving)
Digital Special Collections (Calisphere, Online
Archive of California)
Publishing Services (eScholarship Repository,
eScholarship Editions, collaboration with UC Press)
Digitization of Library
Collections

Special Collections


Manuscripts,
archival
collections,
photographs, etc.
CDL / UC Libraries

Berkeley, University of California, Bancroft Library, UCB 150, f. 252v

Online Archive of
California
Calisphere
Digitization of Library
Collections

Specialized Texts
and Corpora


Making of America
-10,000 texts in 10
years
CDL

eScholarship
Editions
Digitization of Library
Collections

Commercial
Partnerships


Satans stratagems, 1648. copy from UCLA Library
EEBO: 100,000
important early
English texts
Licensed access
via ProQuest
…and Along Came Google

Google Library Project



2005: The ‘Google Five:’
 Harvard, Oxford, New
York Public Library,
Stanford, University of
Michigan
2008: 20 library partners
in 5 countries
Google Publisher Partner
Program
…and the Open Content
Alliance

October 2005



Founders: Internet
Archive, University of
California, U of
Toronto…
Large-scale
digitization of out-ofcopyright works only
A project of the
Internet Archive
…and Microsoft
Out-of-Copyright Works Only
UC Involvement
Founding Member
of Open Content
Alliance
October
2005
UC Joins Google
Library Project
August
2006
Microsoft
Digitization
Agreement
March
2007
So: Three Projects, One Goal


Goal: Mass digitization of library book collections
Google



Microsoft



In-copyright and out-of-copyright works
Available via Google search engine and Google Book
Search
Out-of-copyright works only
Available via Microsoft Live Search
Open Content Alliance



Out-of-copyright works only
Available (via the Internet Archive website) to any and all
search engines
Library and grant-funded
Why Are They Doing It?


Google’s vision: To put all the world’s
information online
Google and Microsoft: To gain marketshare
and competitive advantage for their search
(and online advertising) services


It’s all about Search
OCA: To put the world’s information online,
for free, forever

It’s all about the public good
Why Are We Doing It?

To enhance student and faculty research




To fulfill our public service mission


To put our collections where our users are – in Google!
Mass digitization of these materials enhances access. It can make
people aware of books they may not have discovered otherwise and
lead them, through an internet search, back to our libraries
To support deeper textual analysis and research. Scholars can trace
the evolution of ideas and perform other sophisticated textual analysis
when the full text is indexed and searchable by computer, opening
scholarship in new ways.
Many books of enduring general interest – including classic works of
literature and more unique items such as early histories of the
settlement of California and the West - can now be read by anyone,
anywhere, anytime
To preserve and protect our collections

In earthquake and fire-prone California, digitizing books in our
collections may also help protect the university from catastrophic loss
should disaster someday strike our libraries
Microsoft/OCA Project
Contributors




Northern Regional Library Facility
(NRLF)
Southern Regional Library Facility
(SRLF)
UC Berkeley, Bancroft Library
UCLA
Google Project Contributors



Northern Regional Library Facility
(NRLF) + UC Berkeley Systems
UC Santa Cruz
UC San Diego
CDL’s role, on behalf of UC





Liaison with
partners
Planning &
coordination
Funding
Stewardship of
digital content
New services
Campuses Provide the Books
The Book Digitization Process

A world of
barcodes,
logistics, loading
docks, packing
materials, and
scanning
machines!
Reasons books might get rejected
(images)
Costs to the UC Libraries



Staffing (2-5 FTE at each of 5 locations)
Physical space & facilities
 Scanning centers (where scanning
machines are housed), book processing,
queue storage (book trucks)
 Costs to run campus systems
CDL servers for inventory database, digital
preservation
Digital files




Images
OCR - Text
OCR - Page
coordinates
Metadata
What sort of books are being
digitized?






American history
Humanities
Science
Cookbooks
Children’s books
East Asian & Pacific Rim collections
Where can you access the
books?




Google Book Search:
http://books.google.com/
Microsoft Live Search Books:
http://search.live.com/results.aspx?q=&scop
e=books
Internet Archive:
http://www.archive.org/details/university_of_c
alifornia_libraries
Test version of UC Union catalog:
http://melvyl-test.cdlib.org:8164/F
Copyright status is a factor



Out of copyright, pre-1923
“orphan works,” 1923-1964
1965 - present
At the frontier…
What’s ahead



Digital preservation –storage,
storage, storage
Copyright determination
Print on demand
New modes of access & critical
mass of digital books will transform
scholarship



Full text search - new form of book discovery
Beyond search – text mining,
computationally assisted research
Machines can interact with massive amounts
of texts, and provide new structures
Questions?



Heather Christenson, CDL Mass
Digitization Project Manager
heather.christenson@ucop.edu
Ivy Anderson, CDL Director of Collections
ivy.anderson@ucop.edu
For more information:
http://www.cdlib.org/inside/projects/mas
sdig/
Download