Sharing Collections through Shared

advertisement
HATHITRUST
A Shared Digital Repository
Sharing Collections through
Shared Stewardship
A HathiTrust Progress Report
Unless otherwise noted, these slides and their
contents are licensed under a Creative Commons
Attribution Unported License.
Northwestern University
October 21, 2014
Mike Furlough
Executive Director, HathiTrust
Today’s Conversation
• Do you really know HathiTrust?
– How things work
– Collections and data
• What are we working on now?
• How has the world changed since we began?
– And what does that mean for HathiTrust
21 October 2014
2
The Partnership
Mission
To contribute to the common good by collecting,
organizing, preserving, communicating, and sharing the
record of human knowledge.
Efforts include, but are not limited to
…building comprehensive collections co-owned and
managed by partners.
…enabling access by users with print disabilities.
…supporting computational research with the collections.
…stimulating shared collection storage strategies among
libraries.
21 October 2014
4
Timeline: Highlights
•
•
•
•
•
•
•
•
Google Library Project announced (2004)
Launch (2008)
TRAC certification (2011)
Constitutional convention (2011)
10 million volumes (2012)
New governance established (2012)
Current bylaws and fee structure (2013)
12 million volumes (2014)
21 October 2014
5
HathiTrust Members
Allegheny College
Arizona State University
Baylor University
Boston College
Boston University
Brandeis University
Brown University
California Digital Library
Carnegie Mellon University
Colby College
Columbia University
Cornell University
Dartmouth College
Duke University
Emory University
Florida State University
Getty Research Institute
Harvard University Library
Indiana University
Iowa State University
Johns Hopkins University
Kansas State University
Lafayette College
Library of Congress
Massachusetts Institute of
Technology
McGill University`
Michigan State University
Montana State University
Mount Holyoke College
New York Public Library
New York University
North Carolina Central
University
North Carolina State
University
Northwestern University
21 October 2014
6
The Ohio State University
The Pennsylvania State
University
Princeton University
Purdue University
Rutgers University
Stanford University
Syracuse University
Temple University
Texas A&M University
Texas Tech
Tufts University
Universidad Complutense
de Madrid
University of Alabama
University of Alberta
University of Arizona
University of British Columbia
University of Calgary
University of California
Berkeley
Davis
Irvine
Los Angeles
Merced
Riverside
San Diego
San Francisco
Santa Barbara
Santa Cruz
The University of Chicago
University of Connecticut
University of Delaware
University of Florida
University of Houston
University of Illinois
University of Illinois at
Chicago
The University of Iowa
University of Kansas
University of Maine
University of Maryland
University of Massachusetts,
Amherst
University of Miami
University of Michigan
University of Minnesota
University of Missouri
University of Nebraska-Lincoln
University of New Mexico
The University of North
Carolina at Chapel Hill
University of Notre Dame
University of Oklahoma
University of Pennsylvania
University of Pittsburgh
University of Queensland
University of Tennessee,
Knoxville
University of Texas
University of Utah
University of Vermont
University of Virginia
University of Washington
University of WisconsinMadison
Utah State University
Vanderbilt University
Virginia Tech
Wake Forest University
Washington University
Yale University Library
Growth in Membership
101
91
77
64
49
25
13
2008
21 October 2014
2009
7
2010
2011
2012
2013
2014
Shared Responsibilities
• Leverage expertise across institutions
– Collective work
• Distributed Infrastructure
– Preservation repository and access services
• University of Michigan
• Mirror site: Indiana University
– Metadata management services (Zephir)
• California Digital Library
– HathiTrust Research Center
• Indiana University and University of Illinois
21 October 2014
8
Governance
Board of
Governors
Program Steering
Committee
HathiTrust
Members
Committees
and Working
Groups
21 October 2014
9
Operations
Executive
Director
Five-year terms (beginning Jan 2013)
Betsy Wilson (University of Washington)
Bob Wolven (University of Columbia)
Four year terms
Richard Clement (University of
New Mexico)
Patricia Steele (University of Maryland)
Three year terms:
Carol Mandel (New York University)
Sarah Michalak (University of North
Carolina-Chapel Hill)
Members appointed by the founding
institutions:
James Hilton (University of Michigan)
Carol Diedrichs (Ohio State University)
Laine Farley (California Digital Library)
Wendy Lougee (University of Minnesota)
Brian Schottlaender (UC, San Diego)
Brenda Johnson (Indiana University)
21 October 2014
10
Ex Officio (Board, PSC, Executive
Committee):
Mike Furlough, Executive Director
Executive Committee
- Chair
- Past Chair
- Treasurer
- Chair of PSC
- Executive Director
HathiTrust Board
of Governors
Program Steering Committee
• Serves at the direction of the Board of Governors
to…
– Reviews HathiTrust’s development agenda,
shaping initiatives and strategies for Board
discussion and decision-making, and considering
the implications of those initiatives for the future.
– Recommends alterations in the development
agenda based on such reviews. Based on its
reviews, develops position papers for the member
community to encourage debate or mobilize
discussion with regard to particular issues.
– Works with the Board of Governors to develop
policies for HathiTrust and its members.
21 October 2014
11
Program Steering Committee
Membership
• Ivy Anderson (CDL)
• John Butler (Minnesota)
• Chris Freeland
(Washington University)
• Todd Grappone (UCLA)
• Martha Hruska (UC San
Diego)
• Martin Kurth (New York
University)
21 October 2014
12
• Erika Linke (Carnegie
Mellon University)
• Robert McDonald
(Indiana)
• Matthew Sheehy
(Harvard)
• Elaine Westbrooks
(Michigan)
• Bob Wolven, Chair
(Columbia)
PSC Actions 2013-2014
• Initial focus on Constitutional Convention
Ballot Initiatives
• Re-established Collections Committee
• Created Rights & Access Working Group
• Charged Zephir Advisory Group
21 October 2014
13
Standing Committees
and Working Groups
• Collections Committee
• Rights and Access Working Group
• User Support Working Group
Collections and Access
Growth of Collection
14,000,000
12,104,793
12,000,000
9,966,572
10,599,355 10,878,121
10,000,000
7,836,698
8,000,000
5,221,092
6,000,000
4,000,000
2,477,871
2,000,000
0
2008
21 October 2014
16
2009
2010
2011
2012
2013
2014
Preservation with Access
• Preservation
– TRAC-certified
– Long-term commitments on digital content facilitate
planning, decision-making
• Discovery
– Bibliographic and full-text search of all materials
– Mechanisms for local loading of records
• Access and Use
–
–
–
–
21 October 2014
Full text search (all users)
Public domain and open access works (all users)
Collections and APIs (all users)
Lawful uses of in-copyright works (members)
17
Content Sources
University of Virginia,
0.46%
Utah State University,
University of North
0.00%
Purdue
Keio University, 0.73% Carolina at Chapel Hill,
University,
0.16%
0.41%
Universidad
Columbia
Texas A&M University,
Complutense,
University1.02%
of
University,
0.01%
Minnesota, 1.08%
0.59%
Library of Congress,
Penn
Indiana University, 1.78%
0.82%
Harvard
State,
University,
0.63%
Princeton
University of
2.16%
University, 2.29%
Illinois, 1.05%
New York Public
Library, 2.63%
Boston College, 0.02%
North Carolina State
University, 0.03%
University of Florida,
0.09%
Yale University, 0.22%
Duke University, 0.25%
Cornell University, 4.02%
University of Michigan,
42.52%
University of Wisconsin,
5.06%
University of California,
31.47%
21 October 2014
18
University of Chicago,
0.36%
Northwestern University,
0.34%
Ohio State, 0.01%
Dates
0-1500, 0.04%
1500-1599, 0.07%
1600-1699, 0.01%
2000-2009 1700-1799, 0.01%
10%
1850-1899 1800-1849
3%
1910-1919 1900-1909
10%
4%
4%
1920-1929
4%
1930-1939
4%
1940-1949
4%
1960-1969
11%
1990-1999
14%
1980-1989
14%
1970-1979
13%
1950-1959
6%
* As of February 17, 2014
21 October 2014
19
Language Distribution (1)
Latin, 1%
Remaining
Languages, 13%
The top 10 languages make up
~87% of all content
Arabic, 2%
Italian, 3%
Japanese, 3%
English, 49%
Russian, 4%
Chinese, 4%
Spanish, 5%
German, 9%
French, 7%
* As of February 17, 2014
21 October 2014
20
Language Distribution (2)
The next 40
languages
make up
~12% of
total
Slovak, 1%
Turkish,-Ottoman, 1%
Malayalam, 1%
Finnish,
1%
Romanian, 1%
Malay,
Slovenian, 1%
Telugu, 1%
1%
Greek,MultipleArmenian, 1%
Yiddish, 1%
Ancient-(tolanguages
Panjabi, 1%
1453), 1%Bulgarian
Nepali, 0%
, 1%
, 1% Serbian, 1%
Marathi,
1%
Vietnames
Catalan, 1%
e, 1%
Ukrainian, 1%
Polish, 7%
Greek,-Modern(1453--), 2%
Sanskrit, 2%
Norwegian, 2%
Portuguese, 7%
Dutch, 5%
Hebrew, 5%
Hindi, 5%
Bengali, 2%
Hungarian, 2%
Tamil, 2%
Persian, 2%
Indonesian, 4%
Croatian, 3%
Czech, 3%
21 October 2014
21
Korean, 4%
Danish, 3%
Turkish, 3%
Urdu, 3% Thai, 3%
Swedish, 4%
* As of February 17, 2014
Access: Lawful uses of
in-copyright works
• Sensitive to multiple legal regimes
– Full-text search (everyone everywhere)
– Access to users who have print disabilities
(through member proxy in US, and where law
permits)**
– Access works that are damaged or missing and
also out of print and unavailable (members in US
only)
**Terms and conditions at
http://www.hathitrust.org/access_use#ic-access
21 October 2014
22
Collective Action: Copyright Review
• Copyright Review Management System
– CRMS US: Works published in US, 1923-1963
– CRMS-World: Published non-US (UK, Canada,
Australia, Spain)
– Through both projects over 450,000 items
reviewed
– 52% determined to have some public domain
status
21 October 2014
23
Copyright Distribution
US Gov't 5%
Public Domain (US)
12%
In Copyright
65%
Open
35%
Public Domain
World 18%
Open Access
< 0.01%
Creative Commons
<0.01%
21 October 2014
24
Special Collections: Islamic Manuscripts
21 October 2014
25
Current Initiatives
Current Initiatives
1. Developing a shared print monographs
archive
2. Expanding coverage and access to US
government publications
3. Expanding support for computational (nonconsumptive) research
21 October 2014
27
Shared Print Monographs Archive
• Ballot Initiative passed at the 2011 HT
Constitutional Convention (Con-Con)
– “To develop a print monographs archive
corresponding to volumes represented within the
HathiTrust”
• HathiTrust Board of Governors recently
approved appointment of a PSC-designed task
force to begin process
21 October 2014
28
Photo by Mal BooTH CC-BY-NC-ND https://www.flickr.com/photos/malbooth/5100435988
Why A Shared Print Archive Program
• Many regional efforts, but limited
national/international coordination
• Strengthens preservation commitments
– Connects both print and digital preservation
• Significant need and desire to reduce costs of
collection management and associated
footprint
21 October 2014
29
Print Monographs
Archive Task Force
•
•
•
•
•
•
Tom Teper, Chair (University of Illinois)
Clem Guthro (Colby College)
Robert Kieft (Occidental College)
Erik Mitchell (University of California, Berkeley)
Jake Nadal (ReCAP)
Jo Anne Newyear Ramirez (University of British
Columbia)
• Matthew Sheehey (Harvard University)
• Emily Stambaugh (California Digital Library)
• Karla Strieb (Ohio State University)
21 October 2014
30
Questions: Monographs Archive
• What incentives will encourage participation?
• What services and access models are needed?
• What structures are needed to establish
commitments from multiple programs?
• What costs will be associated with the
program and how should they be allocated?
• Which libraries are most likely to benefit?
21 October 2014
31
Government Documents Initiative
• Ballot Initiative: provide “expanded coverage
& enhanced access to U.S. Government
Documents.”
• Activities:
– Developing a registry of US Federal Government
Documents
– HathiTrust Board of Governors recently approved
appointment of a PSC-designed Advisory Group to
begin process
21 October 2014
32
Photo detail from http://babel.hathitrust.org/cgi/pt?id=mdp.39015087610286;view=1up;seq=14
Existing Government Documents
• 568,219 documents as of September 2014
• 441,096 from the CIC, working in conjunction
with Google
• CIC libraries and the University of California
will continue to supply
21 October 2014
33
Working Group Membership
•
•
•
•
•
•
•
•
•
•
•
•
Mark Sandler (CIC)
Prue Adler (Association of Research Libraries)
Ivy Anderson (California Digital Library)
Joni Blake (Greater Western Library Alliance)
Kirsten Clark (Minnesota)
Rick Clement (New Mexico)
Elizabeth Cowell (Santa Cruz)
Mark Phillips (North Texas)
Jon Rothman (Michigan)
Judy Russell (Florida)
Barbie Selby (Virginia)
Jeremy York (HathiTrust)
21 October 2014
34
Working Group Tasks
• Develop an overall strategy for building a
comprehensive public domain corpus of U.S.
federal documents in HathiTrust
• Recommend investments as needed to pursue
the initiative
• Advise the Board on relevant policy issues
• Provide oversight and guidance as the project
develops
21 October 2014
35
The Registry
• Goal: “….include metadata for the
comprehensive corpus of U.S. federal
documents. This will include materials
produced at U.S. government expense, in all
formats, at the item level, from 1789 to the
present.
21 October 2014
36
Computational Access
• HathiTrust distributes public domain datasets
• HathiTrust Research Center
– Developed collaboratively by Indiana University
and University of Illinois; launched July 2011
– Funding from the Sloan Foundation, Andrew W.
Mellon Foundation, and NEH Office of Digital
Humanities.
– Partially Funded by HathiTrust (2014-2018)
21 October 2014
37
Photo by Nocolas Nova CC-BY-NC https://www.flickr.com/photos/nnova/3455992927
Mission of the HT Research Center
• Mission: Enable researchers world-wide to
accomplish tera-scale text data-mining and
analysis
– Develop cyberinfrastructure to enable HPC access
to the HathiTrust Digital Library
– Develop cutting-edge software tools for
processing, analyzing text
– Develop translational tools and data that can be
used to enhance HathiTrust Digital Library services
to users
21 October 2014
38
HTRC Governance
• Reports to the HathiTrust Board of Governors
– Advisory Group in formation
• HTRC Executive Committee
– J. Stephen Downie (Co-director), Professor and Associate Dean for
Research, University of Illinois GSLIS
– Beth Plale (Co-director and Chair), Director Data To Insight Center and
professor in the School of Informatics and Computing at Indiana
University
– Robert H. McDonald, Associate Dean of Libraries/Deputy Director
Data to Insight Center at Indiana University
– Beth Sandore Namachchivaya, Associate University Librarian for
Information Technology Planning & Policy at the University of Illinois
– John Unsworth, Vice Provost for Library & Technology Services and
Chief Information Officer at Brandeis University
21 October 2014
39
Goals for HTRC
• Provide a persistent and sustainable structure to
enable original and cutting edge research.
– Leverage data storage and computational infrastructure at Indiana
& Illinois
– Stimulate community development of new functionality and tools
– Use tools to enable discoveries that would not be possible
without the HTRC
• Enable scholars to fully utilize content of HathiTrust
Library while preventing intellectual property misuse
within U.S. copyright law.
– Provision secure computational and data environment for scholars
to perform research using HathiTrust Digital Library.
21 October 2014
40
HathiTrust Research Center
UnCamp #3
March 30-31, 2015
University of Michigan
Ann Arbor, MI
21 October 2014
41
Some Thoughts on the
Present and Future
How are we positioned?
• Our mission, collection, and the repository
operations are all strong.
• Our brand reputation is outstanding.
• Our work is solidly supported by the law.
• We have expanded access in unprecedented
ways.
• The partnership provides a solid base for
action.
• We have very important programs underway.
21 October 2014
43
What needs thought?
• Strategy, mission, and role in the future
–
–
–
–
–
Membership growth
Collections program
Public policy
(Inter)National digital infrastructure
Services for members and the public
• Organizational
– Engagement with researchers and libraries
– Enabling more participation in plans and action
– Standing on our own
21 October 2014
44
Assumptions
• Our actions must align with the mission, goals,
and purpose across our partnership.
• A few additional assumptions
– We should pursue complementarity and
cooperation, not competition and duplication.
– Scale will continue to drive our strategies
– Potential partners are not just other libraries and
library organizations, but also readers, authors,
publishers.
21 October 2014
45
How to find out more
•
•
•
•
•
About: http://www.hathitrust.org/about
Resources: http://www.hathitrust.org/resources
Twitter: http://twitter.com/hathitrust
Facebook: http://www.facebook.com/hathitrust
Monthly newsletter:
– http:www.hathitrust.org/updates
– RSS http://www.hathitrust.org/updates_rss
• Contact us: feedback@issues.hathitrust.org
• Blogs: http://www.hathitrust.org/blogs
– Large-scale Search
– Perspectives from HathiTrust
21 October 2014
46
Thank you!
furlough@hathitrust.org
@MikeFurlough
Download