Collaborating Globally, Planning Locally

advertisement
HATHITRUST
A Shared Digital Repository
Collaborating Globally,
Planning Locally
HathiTrust and New Opportunities in
Collection Management
GWLA/UNM: Emerging Collection Management Opportunities
September 26, 2011
Jeremy York
Project Librarian, HathiTrust
Partnership
Arizona State University
Baylor University
Boston University
California Digital Library
Columbia University
Cornell University
Dartmouth College
Duke University
Emory University
Getty Research Institute
Harvard University Library
Indiana University
Johns Hopkins University
Lafayette College
Library of Congress
Massachusetts Institute of
Technology
McGill University
Michigan State University
New York Public Library
New York University
North Carolina Central
University
North Carolina State
University
Northwestern University
The Ohio State University
The Pennsylvania State
University
Princeton University
Purdue University
Stanford University
Texas A&M University
Universidad Complutense
de Madrid
University of Calgary
University of California
Berkeley
Davis
Irvine
Los Angeles
Merced
Riverside
San Diego
San Francisco
Santa Barbara
Santa Cruz
The University of Chicago
University of Connecticut
University of Florida
University of Illinois
University of Illinois at Chicago
The University of Iowa
University of Maryland
University of Miami
University of Michigan
University of Minnesota
University of Missouri
University of Nebraska-Lincoln
The University of North
Carolina at Chapel Hill
University of Notre Dame
University of Pennsylvania
University of Pittsburgh
University of Utah
University of Virginia
University of Washington
University of WisconsinMadison
Utah State University
Yale University Library
The Name
• The meaning behind the name
– Hathi (hah-tee)--Hindi for elephant
– Big, strong
– Never forgets, wise
– Secure
– Trustworthy
Mission
• To contribute to the common good by collecting,
organizing, preserving, communicating, and
sharing the record of human knowledge
Collections and Collaboration
• Comprehensive collection
- Preservation…with Access
• Shared strategies
–
–
–
–
–
–
Collection management, development
Copyright
Preservation (digital and print)
Bibliographic Indeterminacy
Discovery / Use
Efficient user services
• Public Good
Governance
Budget/Finances
Decision-making
Strategic
Advisory Board
Executive
Committee
HathiTrust
Guidance on
Policy,
Planning
How does work get done?
• Collective work
– e.g., working groups
– Perform the work of the partnership
– Now 40+ people across partner institutions
• Distributed work
– Driven by needs of institutions – able to leverage
across the partnership
– Projects, e.g. grant work, ingest specifications,
page-turner, bibliographic data management
• Leverage expertise across institutions
How is work prioritized?
• Initial functional objectives
• Collective processes
– Working groups and committees
• Constitutional Convention
– Ballot Proposals
Partnership
• Who can become a partner?
– Institutions worldwide
– Libraries with print holdings
What are the benefits? (1)
• Cost-effective long-term preservation and access services
for digitized content
– Commitments on digital content facilitate decisions about
digitization efforts and print collection management
• For those with content, immediately offering long-term
preservation, bibliographic and full-text search,
collection-building
• With content or not, full viewing and downloading
capabilities for public domain materials and materials for
which we have received permissions
What are the benefits? (2)
• Specialized access to public domain and in-copyright materials
for users with print disabilities
• Other lawful uses of in copyright materials such as Section
108 uses (print replacement copies, digital access to
applicable works), orphan works
• HathiTrust encourages participation in initiatives and
resources geared toward
– Shared collection development and management (e.g., copyright
review work, print holdings database, de-duplication, collaboration
with other organizations and initiatives)
– Participation in governance and collaborative initiatives
– Defining directions for the shared library.
What’s involved?
• Contract
– Sustaining
– Content-Contributing
• Yearly fees
• Commitment
– 5-year periods
• Shibboleth
• Print Holdings
Consortial Membership?
• No pricing benefit for consortia
• Benefits from where consortia can offer
services to reduce costs for members
– Coordinating ingest, print holdings, other?
Collections:
what we have
Content Distribution
27%
In Copyright
73%
Public Domain
9,706,923 Total volumes
2,636,483 “Public domain”
5,153,036 Book titles
255,907 Serial titles
* As of September 24, 2011
Content Distribution
US Gov Docs
3%
In Copyright
73%
Public Domain
(US)
10%
"Public
Domain"
27%
Public Domain
(worldwide)
14%
Open
Creative Access
Commons .1%
.01%
* As of September 24, 2011
Dates
1700-1799
1600-1699
1900-1909 1800-1849 1%
0%
4%
3%
1910-1919
4%
1850-1899
7%
1500-1599
0%
0-1500
0%
1990-1999
14%
1920-1929
4%
1930-1939
4%
1940-1949
4%
1950-1959
6%
2000-2009
10%
1960-1969
11%
1980-1989
15%
1970-1979
13%
* As of September 24, 2011
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building – 2/2011
Breakdown of HathiTrust book corpus by publication date
Language Distribution (1)
Remaining
Arabic
Languages
2%
Italian Latin
14%
3% 1%
Japanese
3%
Russian
4%
Chinese
4%
Spanish
5% French
7%
The top 10 languages make up
~86% of all content
English
48%
German
9%
* As of September 24, 2011
Language Distribution (2)
Romanian AncientMultiple Malayalam
Sanskrit Greek
Slovak
Malay
1%
Greek
1%
2%
1%
1% Bulgarian
The next 40
1%
1%
1%
Catalan
Portuguese
1%
languages make
1%
6%
Bengali Ukrainian Armenian Marathi
Panjabi
Finnish
Serbian
Slovenian
1%
up ~13% of total
1%
2%
1%
1%
1%
1%
1%
Vietnamese
Undetermined
Polish
2%
6%
7%
Norwegian
Dutch
2%
5%
Hungarian
2% Music
2%
Hebrew
Tamil
Hindi 5%
2%
5%
Persian
2%
Indonesian
Korean
Unknown
5%
Croatian
Czech
4%
3%
Thai Urdu Turkish
Swedish
3%
3%
3%
3%
3%
Danish
4%
3%
* As of September 24, 2011
Content over time
100%
90%
Virginia
Madrid
80%
Columbia
70%
LoC
60%
Harvard
Minnesota
50%
Indiana
40%
Princeton
30%
20%
NYPL
Cornell
Wisconsin
10%
California
0%
Michigan
* As of September 24,
2011
HathiTrust Content Growth
Collaboration:
Collection Management
The Cloud Library
• Toward a Cloud Library
– CLIR, Mellon Foundation
– OCLC Research, NYU, HathiTrust, Recap Libraries
• Objective: Characterize the near-term opportunity for externalizing
management of academic research collections leveraging capacity
of large-scale shared print and digital repositories
• Outcomes: opportunity and risk assessment based on aggregate
collection analysis; draft service agreement enabling generic
consumer library to selectively outsource preservation and access
of low-use research collections to large-scale print and digital
repositories (Malpas, RLG Partner Update, January 2010)
A global change in the library environment
60%
Academic print book collection already substantially
duplicated in mass digitized book corpus
50%
% of Titles in Local Collection
June 2010
Median duplication: 31%
40%
30%
20%
June 2009
Median duplication: 19%
10%
0%
0
20
40
60
80
Rank in 2008 ARL Investment Index
100
120
Continuing growth of overlap …
• ARL overlap
– 31% in June 2010
– 33% in Dec (adjustment: adding little-held works)
– ~ 1% per 225,000 vols
– 45% by December, 2011
• Oberlin Group overlap
– Close to 9% points higher
– 41% in December, 2010
– Close to 50% in May, 2011
– Higher rate of overlap per added volume?
Digitized Books in Shared Repositories
~3.5M titles
3,500,000
3,000,000
~75% of mass digitized corpus is ‘backed up’ in one
or more shared print repositories
~2.5M
Unique Titles
2,500,000
2,000,000
1,500,000
1,000,000
500,000
0
Sep-09
Oct-09
Nov-09
Dec-09
Mass digitized books in Hathi digital repository
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Mass digitized books in shared print repositories
New Cost Model
• Original model based on GB contributed
• New model based on overlap of print collections
with HathiTrust digital collections
• Supported by print holdings database
• Database will
– Support expansion of legal uses of materials:
preservation uses, access for users who have print
disabilities, access to orphan works
– Facilitate individual and collaborative collection
development and management operations
– Will also benefit efforts in de-duplication
Print Holdings Database
• Volumes institutions own or have owned
– For monographic holdings
– Only print volumes (not microform, etc.)
– OCLC number [required]
– Bib record ID [required]
– Enumeration/chronology, if available
– Condition (e.g., brittle) [optional]
– Holding Status (e.g., current holding, withdrawn, missing,
etc.) [optional]
– For serial holdings
- OCLC number [required]
- Bib record ID [required]
- ISSN, if available
Every library is different
• Our median rate of overlap may be the same
• But our overlap profiles will differ by library
Every library is different
•
•
•
•
•
•
Our median rate of overlap may be the same
But our overlap profiles will differ by library
Our use patterns differ
Our risk profiles differ
Our roles vis-à-vis our constituencies differ
Thus, the need to act independently on
common data
Cooperative Print Monograph Archive
• Print monograph storage proposal
– Enable partners to register commitments
– Establish definitions (e.g., environment, use and
condition)
– Build in cost-sharing: collectively fund those that
make commitments
– Communicate information to partners to facilitate
decision-making
http://www.hathitrust.org/constitutional_convention2011
Quality
• IMLS grant led by Paul Conway
• Metrics and measures of quality
• Certification
Quality
Changing Library Landscape
Print Monograph Archive Proposal (HathiTrust Collections Committee):
• “…the potential for ubiquitous information access…challenges the
very foundation underlying the development of vast collections of
printed literature in our nation’s libraries.”
• “This model for collection development and access…is today
becoming less and less relevant to our core mission”
• “From its inception, HathiTrust has aspired to reshape the
landscape of research libraries. This landscape includes the
management of vast, highly-redundant collections of printed
resources for which readily accessible digital instantiations are
increasingly available.”
• “With the advent of HathiTrust…the opportunity exists for our
institutions to not only work together to profoundly influence the
landscape in which we provide access to cultural resources but to
profoundly influence the mechanisms by which we ensure the
persistence of the printed record”
Thank you!
How to find out more
• Web site “About” section:
http://www.hathitrust.org/about
• Twitter: http://twitter.com/hathitrust
• Monthly newsletter:
http://www.hathitrust.org/updates
• RSS: http://www.hathitrust.org/updates_rss
• Contact us: feedback@issues.hathitrust.org
Download