Draft V 1 - Program for the Human Environment

advertisement
Biodiversity Heritage Library
Work Plan and Budget Justification
Building a Digital Open Access Library for Biodiversity Literature
1
Biodiversity Heritage Library
Work Plan and Budget Justification
Building a Digital Open Access Library for Biodiversity Literature
In February 2004, biologists, librarians, and information specialists gathered for a four-day
workshop sponsored by the Pinhead Institute in Telluride, Colorado to assess the feasibility of
assembling a web-based Encyclopedia of Life (EOL). Among the outcomes of this workshop
was a recommendation to digitize the literature of biodiversity. In 2005, an international
symposium, Library and Laboratory: the marriage of research, data, and taxonomic literature,
funded by the Alfred P. Sloan Foundation, was held at the Natural History Museum in London.
This was attended by over 80 biologists, librarians and computer scientists. Participants
identified the lack of each access to the published literature of biodiversity as one of the principal
obstacles to efficient and productive research. Few scientific disciplines are as dependent on the
historical literature in their respective field. In the spring of 2005, representatives of ten major
natural history museum libraries, botanical libraries, and research institutions joined in a
collaborative effort to develop strategies to digitize the literature in an open access manner.
From this partnership grew the biodiversity Heritage Library (BHL) project.
The partners envision that a research scientist or student who has access to the Internet,
located anywhere in the world, will be able to search for specific information in all of the
literature relevant to biodiversity and transparently link the documentation to relevant taxonomic,
geographic, or other useful databases. Such a tool would erase much of the expensive, laborintensive work of library research and speed the production of research results many times over.
The BHL partnership is essential because while natural history museum and botanical
garden libraries have collected biodiversity materials comprehensively, including many
2
specialized and rare materials, no single library holds the complete corpus of legacy literature.
The partners’ collections represent a uniquely comprehensive assemblage of this literature.
Within two years of the start of this project, the BHL will provide approximately 25 million
digitized pages of literature to support multiple bioinformatics initiatives and research. For the
first time in history, the core of our natural history museum and botanical garden libraries will be
available to a truly global audience.
This proposal seeks $6,053,000 to carry out the first two-year phase of this ambitious and
far-reaching project and an additional $3,611,000 for the remaining three years of the project.
The Partnership
The participating institutions are:
American Museum of Natural History (New York, NY)
The Field Museum (Chicago, IL)
Harvard University Botany Libraries (Cambridge, MA)
Harvard University, Ernst Mayr Library of the Museum of Comparative Zoology
(Cambridge, MA)
Marine Biological Laboratory / Woods Hole Oceanographic Institution (Woods Hole,
MA)
Missouri Botanical Garden (St. Louis, MO)
Natural History Museum (London, UK)
The New York Botanical Garden (New York, NY)
Royal Botanic Gardens, Kew (Richmond, UK)
Smithsonian Institution (Washington, DC)
The BHL members have formed a Board of Directors with elected officers and have signed
agreements committing them to the project. Other participants may be sought at a later time.
Methodology:
BHL member institutions will scan (or have already scanned) their institution’s own
scientific publications to contribute to the BHL. Members will scan other volumes from their
collections that are not covered by copyright or for which permissions have been obtained using
3
the Internet Archive, a non-profit partner, to quickly add large segments of literature to the
corpus. Ongoing negotiations with commercial publishers and learned society publishers are
promising: newer literature may be made available with permissions. The BHL as a communitybased partnership should provide a trusted grouping to negotiate with copyright owners. Small
society publishers need help scanning and storing their publications and if we provide this
service, some are pleased to have their content accessible through the BHL portal. A set of
clearly defined working relationships will be established as part of the project. User input to the
process will be solicited regularly.
The BHL selected the Internet Archive (IA), an organization with demonstrated technical
qualifications for mass scanning and long-term digital content management, to scan the bulk of
the literature through scanning centers. A scanning center consists of multiple, high-speed, stateof-the-art digital book scanners, with staff for two shifts daily, each able to handle large numbers
of volumes daily. The IA will perform imaging, OCR, association of standard metadata (derived
from MARC records provided from the libraries’ catalogs) with the digitized files, file
arrangement, security, maintenance of optimal book conservation practices and delivery of the
completed scans in conformance with agreed project standards. Article level access to journals
will be provided.
BHL Portal with Taxonomic Intelligence:
The digitized literature will be served to users from the Biodiversity Heritage Library
Portal to be hosted initially by the Missouri Botanical Garden. The BHL Portal will create an
innovative research environment that will vastly accelerate research in life sciences and
conservation: a freely accessible, service-based Web Service formed through coupling existing
databases with digitized, searchable images and OCR text of heritage literature. The BHL Portal
4
will use existing informatics tools to identify strengths and overlap across the participating
institutions' libraries and to help solve the problems associated with the naming of organisms
over time.
Taxonomic Intelligence:
The digitization of a major corpus of biodiversity literature will advance world
biodiversity initiatives significantly, but only to the extent that users can find relevant content.
Names of organisms annotate content about species. However, the use of names for information
retrieval is impeded because names are neither stable nor consistent. One organism may have
more than one name. This prevents simple automated indexing services from bringing together
complementary data. Moreover, about 1% of names change each year, such that the manynames-for-one-organism (synonyms) problem accumulate with time and will be particularly
severe with heritage literature. Visitors to traditional library scanning projects who know
organisms by their colloquial (common) names may be unable to find content unless they know
the names used in the source documents. These issues will reduce the utility of the millions of
pages of primary biodiversity information to be generated by the BHL without the added tools
intended in the BHL Portal. The uBio team from the Marine Biological Laboratory / Woods Hole
Oceanographic Institution (MBLWHOI) library has assembled an array of taxonomically
intelligent services designed to overcome these problems.
Selection of Materials to be Scanned
Because BHL requires moving physical objects (books) and contracting for expensive
scanning machines that are built to order and require extensive set up in specific locations, it is
5
necessary that selection of locations and material be done with clear priorities and reasons. BHL
involves 10 separate institutions with different collection strengths. Selection of material to be
prioritized will be a multi-prong approach including thematic areas, date of publication, and
quantity of material. BHL Directors have organized a “BHL Collections Working Group,” which
will refine and further articulate the thematic areas in March and April. The initial titles to be
scanned will either be in the pubic domain or have copyright permission. Input from the EoL
Secretariat will be the initial focus of the themes assigned. In addition to thematic focus provided
by the EoL Secretariat, the BHL will also analyze such major indexes as Index Kewensis,
Sherbourne’s Index Animalium, and Neave’s Nomenclator Zoologicus. BHL is in discussions to
obtain permissions to mine Zoological Record to determine those journals that have been most
cited in the literature of species identification and description. This will provide a priority list of
journals and monographs. Scanning from a prioritized list created through citation analysis
ensures that the BHL contains the most critical works for our audience of scientists and scholars.
Additional thematic focus will be possible post-scanning after applying taxonomic intelligence to
draw out all the species names so that a filter can be run with, for example, the species names
from the Ocean Biogeographic Information System (OBIS) and tag those articles or monographs
with marine data. Other name sources can be employed in a similar manner. BHL will also
remain flexible enough to take timely advantage of offers from significant learned society
journals to digitize back holdings in an open access manner.
As the EoL Secretariat establishes thematic areas of emphasis for the EoL, the BHL
libraries will select texts that best support the thematic area. Initial thematic assignments for
2007-2008 are indicated in the initial thematic area of the Appendix A, spreadsheet
bhlfivegrant.xls. Years 2009 and following have the acronym EoLS representing that thematic
6
assignments will be based on guidance mediated by the EoL Secretariat. If that guidance is
available in 2008, work can be redirected towards that area
Deliverables
Deliverable 1: Metadata Repository and Analysis
The BHL has received $50,000 from the Richard P. Lounsbery Foundation, which has been
partially applied to the creation of the BHL Metadata Repository and initial collection analysis.
Members are now creating a union list of biodiversity serials to serve as the basis for distributing
responsibility for scanning since serial titles contain hundreds of volumes and often reach over
100,000 pages. Digitizing serials will be a cost-effective method for mounting a large amount of
the most useful scientific material online in the shortest amount of time. An early collection
analysis indicated considerable overlap among participant libraries’ holdings. A new collection
analysis will help establish a productive division of labor among members and to further inform
digitization priorities. This activity will be completed by the end of the first year. BHL will
contract with the Online Computer Library Corporation (OCLC) for the World Cat Collection
Analysis Tool (using Lounsbery Foundation and member contributed funds), which allows
consortia to analyze their collections and compare them to other collections. In this way,
members will be able to compare group holdings by specific subject areas and through time -- ie.
pre 1923. The tool includes a web interface that allows users from any library to view collections
and choose from multiple parameters, including subject, titles and title counts, publication date,
language, formant and audience. The collection analysis will allow the BHL to further identify
overlap and uniqueness of all its member library collections.
Deliverable 2: Digitized literature.
7
The volumes will be scanned with a unique digitization system called the Scribe,
developed by the Internet Archive (IA), which facilitates the scanning and description of the
object, as well as depositing it for storage, serving, and further reuse. Each digitized volume will
be encoded with authoritative metadata from the contributing library’s catalog (such as Library
of Congress Subject headings and other descriptive metadata about the title) and will include
overall structure of the volume and the pages within. The digitized pages, derivative images and
files, and all associated metadata will be deposited at the IA for further analysis and further
redistribution via the BHL portal. The IA will host and serve the image files for searches from
the BHL Portal. The text files, enhanced by taxonomic intelligence (see below), will be hosted
initially from the Missouri Botanical Garden (MBG) and, as the design evolves, may continue to
be hosted from there or may move to the MBLWHOI depending on architecture decisions in the
first year of design. These materials will be referenced by persistent globally unique identifiers
(GUIDs) at the title, volume and page level, so that they can be integrated across existing
bibliographic and taxonomic citation databases, like Tropicos, NameBank, and ZooRecord. By
the end of year two we expect to have more than 25 million scanned pages online and a total of
at least 40 million pages online by the end of year five.
As digitized objects are added to the IA, existing automated processes will complete two
major workflow steps—text conversion and keyword indexing—that transform a digital image of
a page into a text-based XML document, on which scientific names and other keywords will be
annotated using semi-automated natural language processing tools that will be developed. These
annotations and relationships between scanned images and keywords will be identified, recorded
in the BHL Portal database, and then exposed within additional XML files.
Deliverable 3: Integration of taxonomic intelligence.
8
Building on existing tools and services developed at uBio to index organism names
(NameBank; which contains approximately 9.5 million name strings) and their associated
hierarchies (ClassificationBank; which contains over 80 classifications), taxonomic intelligence
will be integrated into the documents immediately as they are digitized using an established
named entity recognition tool, TaxonFinder. (See the EoL Informatics Work Plan for further
details)
The integration of taxonomic intelligence, via links from name strings located within each
XML file generated (Deliverable 2), will enable linkages to other relevant indexed content in
EoL and other web-accessible name-based sources. In cases where a name string cannot be
resolved to an existing NameBank string or reconciliation group (a set of strings that objectively
represent the same organism), it will be passed to other complementary services (e.g., EoL
WorkBench; see EoL Technical Documents). The types of organisms that are associated with
each digitized document will be characterized using the taxonomic groupings reflected in
ClassificationBank (including the proposed EoL Union taxonomy). This will include the
generation of descriptive statistics pertaining to organisms relative to other comparative axes
(e.g., temporal or geographic). A complete list of name strings as they appear in each digitized
document, reconciled contemporary form of name string, and any other relevant metadata will be
incorporated into the XML index files stored initially at the Missouri Botanical Garden.
Deliverable 4: Development of a robust community portal with open, distributed architecture.
Users today expect sophisticated presentation and collaborative, interactive Web
resources. To make this Web portal operative, BHL will program an intelligent, customizable
interface into all parts of the repository. The interface will enable users to conduct a search for
particular terms in the BHL repository and find pages where these terms occur throughout the
9
entire collection of digitized literature. Users will be able to obtain a bibliography of the
literature containing the keyword(s) or view the individual scanned pages (or the indexed text if
they prefer) and see all of the keywords that appear on the page.
In addition to the internal search capabilities and outward external links, the BHL Portal
will provide the mechanism necessary to accept queries from external databases, libraries, or
individual Web users, returning appropriate images or bibliographic references. Additional
authoritative content available on the Internet will be linked via the incorporation of tools like
LinkIT and uBioRSS, which respectively create dynamic hyperlinks to authoritative Web sites
and enables access to recently published knowledge (e.g., literature, scientific news, blogs,
pictures of the day, distribution data, and specimen collections). Both the LinkIT and uBioRSS
tools will be modified to link resources that can meet the specific objectives set forth by the
overall EoL vision.
This project will be the first, through automated markup of the digitized literature and
integration of taxonomic intelligence, to provide access to both the literature and supportive
natural history data in an interactive, Web-based environment. The combined search options and
broad interactive capabilities are unique to this project and will be the focus of initial
development by the end of year two. However, the scanning can begin immediately, since postprocessing of the files will enable these features. By the end of year five, advanced search
functionality and end-user tools determined through ongoing usability analysis and feedback will
be developed and deployed.
Testing of the system will be critical to determine whether the BHL Portal reaches its
intended audience and will include internal review by the BHL partners and reviews by other
scientists and IT professionals. We will undertake testing for all components and subsequent
10
modification and retesting as needed throughout the project. The BHL Portal will undergo
intensive testing, including post-deployment surveys and on-line comment/suggestions options.
The BHL Portal will communicate with the IA and other service providers assisting the project
using web services, a mechanism by which data can be shared among disparate data sets using
standard protocols and XML. Building web services on top of the BHL Portal will allow us to
manage the data separately, while still providing a way to communicate the relationships
between the data sets and the records within. These services can be “published” as entry points
for other communities to inquire and interact with the BHL data, for example, a conservation
organization interested in finding literature on a given scientific name, but requiring the literature
to be displayed in its own application. The organization’s application can address BHL’s Web
services and determine what digitized literature is available for the scientific name in question.
The results will provide inlinks to the BHL digitized literature from the organization’s existing
application. In this way, Web services will meet the specific needs of this usage, while
continuing to make the raw data available to others.
Deliverable 5: Permissions from Publishers.
The BHL will negotiate permissions from non-profit biodiversity learned societies to
digitize their back issues that are still covered by copyright and to aggregate the digitized files
with other BHL content as well as provide free copies to the learned societies. The BHL will
also seek permissions from commercial publishers.
Deliverable 6: Digital Curation
Digital curation is a critical part of making the BHL a sustainable project that will ensure
sustained, persistent access to the BHL content for centuries using the best available technology
and administrative structures for preservation of digital files. A plan for extending collaboration
11
from the original core group of institutions to include other natural history and botanical libraries
will be pursued as a secondary step in the project evolution. The metadata, image files, digital
derivatives, and text files generated during this project will create a significant resource that will
require ongoing stewardship beyond the timetable outlined in the proposal. The BHL Board of
Directors will develop a plan for archival storage and appropriate migration of data within the
second year. The plan may involve multiple dark archives and the participation of primary
publishers of content, such as commercial publishers and society publishers, as well as content
aggregators such as OCLC. Working with the EoL Secretariat, their host institutions, and major
stakeholders, BHL Directors will develop a plan for long-term administrative/corporate
structures to ensure that the BHL digital assets remain openly available to users in the future.
Whether this will require incorporating as 501 (c) 3 or a UK Trust or some other arrangement
will be determined by a study in the third and fourth years.
Refer to, Deliverables and Schedule following for the implementation schedule.
12
Deliverables and Schedule
ESTIMATED output
notes
(Deliverable 1)
Metadata Repository
and Analysis
5/2006-6/2007
7/2007 – 12/2007
1/2008 -12/2008
1/2009 – 12/2009
First analysis;
broad collection
strengths;
initial allocation
of scanning
priorities
Harvesting,
ingest, and
export of BHL
metadata to/from
major sources,
e.g. OCLC etc.
Updated as new
collections are
acquired and as
volumes are scanned
(Deliverable 2)
Digitize literature
Negotiate
agreements with
Internet Archive;
install scanners
in selected
locations;
Negotiate
agreements with
regional centers
to use their
facilities where
possible
Prototype
development;
requirements
definition
Clean metadata
repository prototype;
union list of serials;
OCLC analysis;
refinement of selection
process base on
thematic direction from
EoL Secretariat
Hire library technicians
for pulling and moving
volumes. Output
5,990,000 digitized
pages ~17,114 – 29,950
volumes;
Output
20,508,000
digitized pages,
~58,594 –
102,540 volumes
(more if fund
raising is
successful);
content
resolvable to the
title, volume, and
page level
Sophisticated
public search
interface with TI
to collocate
taxon names in
text; ability to
include variant
and vernacular
names; ingest,
harvesting, and
export of files
fully functional;
integration with
existing indices
(Deliverable 3 &4)
BHL Portal with open
distributed architecture
and integration of
Taxonomic
Intelligence (TI)
Import of ocr files from
IA; production testing
of first iteration of (TI).
Searching; simple
public search interface;
full text searching
1/2010 –
12/2010
Updated as new
collections are
acquired and as
volumes are scanned
1/2011 -
Output 17,676,000
digitized pages
~50,503 – 88,380
volumes
(more if fund raising
is successful)
Output dependant on
fund raising
Output
dependent on
fund raising
ingest, harvesting,
and export of files
fully functional;
prototype open
community interface;
resolve citations
using indices and
matching algorithms;
ability to limit
searches to article
titles, taxonomic
treatments
Ongoing maintenance
and refinement;
If additional funding
obtained
operationalize full
open community
functionality
Ongoing
maintenance and
refinement. If
additional
funding obtained
operationalize
full open
community
functionality
Updated as new
collections are
acquired and as
volumes are
scanned
13
(Deliverable 5)
Learned societies and
publishers
3 prototype
permissions
documents
signed;
presentations at
conferences;
legal strategy
determined
20 permissions
obtained; working
agreement with BioOne
obtained;
(Deliverable 6)
Digital Curation
Fundraising
Prepare funding
proposals for additional
projects, e.g. 100 year +
stewardship planning
and licensed
content
Working with
EoL Secretariat
develop global
marketing plan
for reaching as
many learned
society
publishers as
possible; more
permissions
Implement plan
Planning for
BHL
administrative
structure to
ensure long-term
100 year plan for
management of
BHL digital
assets
Prepare funding
proposals for
additional
projects, e.g. rare
book collections
Additional
permissions obtained.
Implement global
marketing plan
Additional
permissions obtained
Implement global
marketing plan
Additional
permissions
obtained.
Implement
global marketing
plan.
Planning for BHL
administrative
structure to ensure
long-term 100 year
plan for management
of BHL digital assets
Planning for BHL
administrative
structure to ensure
long-term 100 year
plan for management
of BHL digital assets
Implement new
administrative
structure, e.g.
incorporation as
501© 3 or UK
Trust
Prepare funding
proposals for
additional projects,
e.g.
Underserved taxa
Prepare funding
proposals for
additional projects;
paleobiology
Prepare funding
proposals for
additional
projects;
paleobiology
14
BHL Member Contributions and Fundraising
From 2002 through 2007 BHL member institution will have contributed more than $1,200,000 to
the BHL covering such costs as digitizing 1,500,000 pages of biodiversity literature, holding
planning meetings, and proportionate salary expenses for staff directly involved in BHL work.
These funds have come from operating budgets, internal special purpose funds, and from
external development efforts. Additional local and coordinated fund raising efforts will be a
major focus of the BHL. BHL member institutions will also redirect existing staff to BHL
support work such as library technicians to move materials, catalogers, analysts, and others.
The budget documents in Appendix A contain a spreadsheet, Estimates of BHL Institution
Contributions, that reflects redeployment of existing staff, potential augmentation from EoL
Cornerstone institution fundraising, and target fundraising goal for the BHL libraries. As the
mass scanning starts, the Contributions spreadsheet will be refined.
Outcomes of Scanning the Biodiversity Literature
Scanning the biodiversity literature and making providing the text through a common
portal with sophisticated search tools will produce a number of long-term benefits for the global
biology community. These outcomes can be grouped under a number of headings:

Improving the efficiency of research in the biology domain

Improving access to information to non-museum biologists

Repatriation of information about developing world species

Capacity building in the developing world

Preservation of rare and fragile materials
15
These outcomes may be implicit or explicit. BHL will put in place Web-based tools to
survey the users of the BHL Portal to determine who, what, and how users approach our content.
It is anticipated that a brief online questionnaire in the Web interface will pop up for a certain
percentage of users. The questionnaire will ask questions about the users’ purpose in visiting the
BHL Portal site, what benefits they expect to gain, and what savings they may achieve. These are
mainly ‘soft’ measures, but will give us a sense of the changes in the customer base and their
usage of materials.
However, BHL members believe that some measurable financial outcomes are also
achievable. For instance, the Smithsonian Institution Libraries hosts over 600 non-U.S.
researchers, many from developing countries lacking libraries with significant biodiversity
collections, who travel simply to use the volumes in its collection. Factoring an average two-day
visit cost of $1,000 that includes flights and lodging in Washington, D.C., an estimated 50% drop
would save the research community $300,000 annually. If 30% of the 15,000 visitors to the
Natural History Museum in London from the UK and other countries were able to save two days
of their visit by consulting the literature at their home base, this would save the visitors $1
million per annum. The Library of the Royal Botanic Gardens at Kew hosts over 170 visitor days
annually from researchers from developing countries to whom they provide 4,000 sheets of
copying paper gratis. See Appendix B for a sample of the thanks our libraries have already
started receiving with our modest efforts to date.
In order to measure these potential outcomes, BHL will spend some time at the beginning
of the project ensuring that our approach to counting customers and surveying customers is
consistent across the BHL partners. This will enable us to set an early baseline from which to
16
measure the impacts which we expect our outcomes achieve. To ensure consistency of approach,
BHL will coordinate the methodology with the other EoL partners.
Key Outcomes
Improving the efficiency of research in the biology domain:

BHL expects that the web availability of the majority of the biodiversity literature will
have a long-term impact on the way that taxonomic science is done. The ability to bring together
all the literature on a given taxon or group at an individual’s desktop will increase efficiency and
speed up the process of taxonomic revision. Large paper collections of individual articles will
become a thing of the past.

The BHL will lead to an acceleration of the taxonomic process in both developed and
developing countries. This supports the wider objectives of the EoL and enables the greater
integration of taxonomic effort globally.

Full-text searching and taxonomic intelligence will ‘unearth’ inaccessible information
from older material. This will allow new analysis and data mining by bringing together material
from different institutions to provide a new synthesis.
Output Measures:

BHL expects the online questionnaire to show changes in the efficiency and effectiveness
of individual scientists e.g. Appendix B: 4, 5, 6.
Improving access to information to non-Museum biologists:

The BHL will expose the biodiversity literature to other biological science disciplines –
ecology, forestry, land planning, environmental assessment, etc. - and a broad range of other
potential users in medicine, history and social sciences.
17

The BHL will also provide improved access for non-taxonomists to original descriptions
and identifications. Such links could provide powerful new tools to medical researchers or
environmental and ecological monitoring organizations, where precise species identification is
critical for their work.
Output Measures:

The online questionnaire will seek to obtain sector information about users, and ask them
about the application of the BHL materials in their sector e.g. Appendix B: 7.
Repatriation of information about developing world species:

Full access to the published literature will effectively repatriate biodiversity information
back to the original country containing described organisms. In many cases in the developing
world, local taxonomists and para-taxonomists will not have access to any of the literature on
their local biodiversity.
Output Measures:

The online questionnaire will seek information on the country of origin of the scientist
using the BHL, and the territory in which the scientist intends to use the information e.g.
Appendix B: 1, 2, 3, 4, 5, 6.
Capacity building in the developing world:

Access to the BHL content will support the curricula of training new taxonomists in
developing countries. This can help mitigate the “Taxonomic Crisis”
http://www.actionbioscience.org/biodiversity/page.html

The number of out-of-town visitors that need to visits our library collections should
reduce significantly. As indicated above, savings of at least $1 million per annum are achievable.
This ‘saving’ will persist and should help to boost local capacity.
18
Output Measures:

The online questionnaire will seek information on the country of origin of the scientist,
and of the benefits which will accrue locally e.g. Appendix B: 2.
Preservation of rare and fragile materials:

The BHL partners will be able to conserve their rare and original volumes through
minimizing handling.

Where appropriate, BHL will have the ability to print new facsimile titles on-demand (for
instance, for teaching or fieldwork).

BHL partners may be able to achieve cost savings by the use of remote storage facilities
for the print materials that have been digitized.

The BHL will provide a low cost “not-for-profit” mechanism for small learned and
professional societies to make the backfiles of their journals digitally available.
Output Measures:

The portal will enable us to track the use of rare and fragile materials, and this will give is
a measure of increased usage while minimizing the handling.
We will track the learned societies with whom we have agreements as part of the wider
engagement of the EoL with the taxonomic community.
19
Budget
The Biodiversity Heritage Library Annual Operating Budget (Appendix A attached spreadsheet
bhlfiveyeargrant.xls) reflects all costs to be funded by the John D. and Catherine T. MacArthur
Foundation and the Alfred P. Sloan Foundation over five years. The budget does not distinguish
between costs funded by the two foundations, leaving that decision to representatives of the
respective foundations. The Annual Page Output (Appendix A, attached spreadsheet
bhlfiveyeargrant.xls) reflects our best estimate of the number of digitized pages that will result
from the BHL project. The number of volumes reflect either a 350 page or a 200 page per
volume estimate as indicated. More volumes can be scanned if local supplementary fundraising
is successful.
BHL Project Director:
As with any large, collaborative project involving multiple institutions, the BHL requires
high-level coordination and management. The BHL Director will negotiate agreements with
learned societies for digitization rights, draft contracts/agreements with major partners such as
the IA; manage, track, report, and disburse funds for the performance of the BHL work, create
enduring partnerships with peer initiatives, prepare fund raising proposals, oversee the BHL
Portal Development Team, liaise with the EoL Secretariat to assure congruence of efforts, and
assure that BHL libraries and the IA deliver expected output. The BHL Project Director will
need to perform onsite visits to assist in implementation and evaluation of the scanning
operations and to engage BHL planning efforts. The BHL Director will be housed at the
Smithsonian Institution Libraries in Washington, D.C.
20
Local Salaries/Library Technicians
Finding volumes and checking them for scanning suitability, tracking them through local
circulation systems, delivering them to the scanning location or pick up point, retrieving the
volumes, checking for damage, and reshelving are all simple but very time-consuming tasks that
become enormous in the quantities required for the BHL project. In many libraries, the high
production levels required by the project cannot be completely absorbed by existing staff.
Metadata Repository:
The metadata repository will be developed using existing funds but ongoing simple
maintenance will be necessary.
BHL Portal
Funds will cover 50% of a BHL Technical Manager at the Missouri Botanical Garden and two
full time programmers for two years working under his direction. In prototypes to date, the
Missouri Botanical Garden staff have demonstrated ability to manage development of this
complexity and deliver results as required.
Direct Scanning Costs
These are the payments to our scanning and hosting service provider, the Internet Archive
(IA), a non-profit organization dedicated to “Universal Access to Human Knowledge.” The IA
has a proven track record of delivering and serving high-quality digital pages and their text
versions at extremely low costs. Their cost model benefits from concentrating as large a mass of
materials in as short a period of time as possible. IA is able to offer such compelling prices based
21
on a model where the total number of Scribe scanners is kept maximally occupied for a
minimum of two years. Any slacking in the delivery drives the average per page cost up. The
budget plans for funds to be allocated to specific locations, so the respective libraries can do
extensive local planning and set up and so that, the appropriate number of scanning stations to be
delivered and staffed can be estimated. However, prior to the actual agreements for each separate
location being finalized, the BHL must to reserve the flexibility to change the provisional
assignment of IA scanners at the locations reflected in the budget based on last minute
information and changes in local library situations. Any such changes will be done only with
approval of the EoL Secretariat. The differing per-page charges reflect such factors as number of
staffed scanning machines, currency conversion rates, whether the scanning machine is part of a
wider regional shared facility, etc.
Transport Costs
Many BHL Libraries can achieve significantly lower per-page scanning costs if they use
IA facilities that are shared with other libraries and are not on site. However, they may incur
substantial transport costs to deliver and retrieve their materials.
Meetings
A project of this scope requires extensive networking and sustained meeting time to
review progress and plan next steps. As the BHL ramps up, a forum will be necessary for
involving new partners.
Budget Notes
The funds requested will create the infrastructure and base for the BHL and create a large body
of digitized literature as soon as possible. Funds in addition to those requested can be easily and
22
immediately applied to increase the number of volumes scanned without significantly increasing
other costs except, in some cases, transport or local library technicians. Roughly, every
additional forty dollars will add another volume. Appendix A is the spreadsheet
bhlgrantfiveyear.xls.
23
Appendix B
Researchers Comments Concerning BHL Digitization
1)
First of all congratulations for the Botanicus.org webpage, it's a very useful project specially if
you are located at countries far away from decent libraries (as I am in Costa Rica).
I like the PDF version of the books, as it can be downloaded; it is easy to read parts of the books
off-line.
Best regards
Walter Schug
2)
Thank you so much for not only your quick response to my ILL request, but even more for your
attaching the item as a .PDF file so that Prof. Newton received it almost instantly across the
ether. I know he emailed you a much more timely thank you. He wrote me that he is extremely
excited about your digitization project. At the moment he and his graduate botany students in
Kenya have access to very few resources. He spends his summer terms at Kew doing his
research for the next year's teaching and writing, but he tells me that now, because of what is
already on your site, he will not have to carry so much back to Kenya for his research and his
students but can download and work with your resources right there.
I am cutting this note short as I myself now am heading off for my summer break, a family
reunion back in Ohio, but I would like to send you a copy of Prof. Newton's original article, and
you can then see how much you have helped his research.
Emilie Pulver
University of Kenya
3)
I am absolutely amazed of this tool! I think it is fantastic what we can offer to the botanical
community and beyond to have at our finger tips. Congratulations to you and the staff involved.
Thanks for all your good work.
Carmen Ulloa
4)
In reference to: Bulletin des Séances de la Société Entomologique de France. [Paris] : Société
entomologique de France, [1873-1884]
My deepest gratitude for allowing me access to the digital version of the very rare "Bulletin des
Séances de la Société Entomologique de France". It has been very important for my work on the
database of the names of the butterflies of the world to be able to consult at leisure this series,
which is held by extremely few libraries in the world. I cannot stress enough the importance of
having access to electronic versions of the literature, especially to us researchers who cannot
benefit from well-endowed institutional libraries. The Smithsonian Libraries are doing a great
service to science by making openly accessible such crucial works as the "Biologia Centrali-
24
Americana", and now the above-mentioned "Bulletin". I only wish that there were many more
such electronic resources. Please keep up the excellent work!
Dr Gerardo Lamas
Museo de Historia Natural
Universidad Nacional Mayor de San Marcos
5)
In reference to: Frederick Ducane Godman and Osbert Salvin, eds. Biologia CentraliAmericana. [London: Pub. for the editors by R. H. Porter], 1879-1915.
I have to my position the collection of Coleoptera of the Faculty of Superior Studies of the
Independent National University of Mexico and daily I consult volumes of BCA for the
identification of specimens of Coleoptera, because it is a wonderful work and until the moment
does not exist another source that supports in the identification of the Mexican species of several
families of this group.
Ma. Magdalena Ordóñez Reséndiz
Museo de Zoología, FES Zaragoza, UNAM
6)
In reference to: Walter Rothschild. The Avifauna of Laysan and the neighbouring islands with a
complete history to date of the birds of the Hawaiian possession. London: R H Porter, 18931900
Aloha. I live on The Big Island of Hawai'i, a $300.00 plane ride away from Honolulu and the
Bishop Museum. Even when I can make it to the Museum (where I study the Hawaiian Bird
Skins), they do not have every single bird (moho apicalis, the Oahu moho is missing)….I have
been looking for this text for over TWENTY YEARS. Mahalo nui loa for all your hard work.
Reading these pages mean so much to me and many others. I hope they show there appreciation
as well. It truly is very important. I cannot thank you enough, nor stress the importance of your
website enough. Thank you for putting these items on the web, and in such a findable manner.
Aloha
Gwendolyn O'Connor
7)
In reference to: Howard Jones. Illustrations of the nests and eggs of birds of Ohio. Circleville,
Ohio, 1886.
Virginia Hunt, a doctoral candidate at LSU, will be using this site as part of her dissertation on
ornithological illustration; commented Ms. Hunt: “My study concerns surveying a large number
of ornithological narrative paintings, in particular historical examples, in order to determine how
these may be used by high school and college biology teachers to teach certain key concepts in
ecology while doing 'double duty' in illustrating subtle aspects of the history and nature of
science.
Bruce Shelvey, Ph.D.; Chair, Department of Geography, History, and Political and International
Studies and Associate Professor of History and Political Studies, Trinity Western University
(Canada).
Download