Merchants of Light, Depredators and Pioneers

John Unsworth
Wisconsin Distinguished Lecture
November 9, 2011
Libraries, researchers, and the battle for
institutional resources
Francis Bacon’s New Atlantis and reimagining roles in the research
Why big data calls for digital
humanities with Bacon (with some
digital humanities center is an entity where
new media and technologies are used for
humanities-based research, teaching, and
intellectual engagement and experimentation.
The goals of the center are to further
humanities scholarship, create new forms of
knowledge, and explore technology's impact
on humanities-based disciplines.
DHCs can be grouped into two general
 Center
focused: Centers organized around a
physical location, with many diverse projects,
programs, and activities undertaken by faculty,
researchers, and students. These centers offer a
wide array of resources to diverse audiences.
Most DHCs operate under this model.
 Resource
focused: Centers organized around a
primary resource, located in a virtual space,
that serve a specific group of members. All
programs and products flow from the resource,
and individual and institutional members help
sustain the resource by providing content,
labor, or other support services.
Of late, there is a growing interest in fostering greater
communication among centers to leverage their numbers
for advocacy efforts. However, few DHCs have
considered whether an unfettered proliferation of
individual centers is an appropriate model for advancing
humanities scholarship. Indeed, some features in the
current landscape of centers may inadvertently hinder
wider research and scholarship. These include …
The silo-like nature of current centers is creating
untethered digital production that is detrimental to the
needs of humanities scholarship. Today's centers favor
individual projects that address specialized research
These projects are rarely integrated into larger digital
resources that would make them more widely known and
available for the research community. As a result, they
receive little exposure outside their center and are at
greater risk of being orphaned over time.
The independent nature of existing centers does not
effectively leverage resources community-wide. Centers
have overlapping agendas and activities, particularly in
training, digitization of collections, and metadata
development. Redundant activities across centers are an
inefficient use of the scarce resources available to the
humanities community.
Large-scale, coordinated efforts to address the "big"
issues in building a humanities cyberinfrastructure, such
as repositories that enable long-term access to the
centers' digital production, are missing from the current
landscape. Collaborations among existing centers are
small and focus on individual partner interests; they do
not scale up to address community-wide needs.
 When
one is investigating collaborative models for
humanities scholarship, the sciences offer a useful
framework. Large-scale collaborations in the sciences
have been the subject of research that examines the
organizational structures and behaviors of these entities
and identifies the criteria needed to ensure their success.
The humanities should look to this work in planning its
own strategies for regional or national models of
The New Atlantis:
 Inventing the research university
 Experiments without hypotheses
 Emphasis on observation
 No libraries, but implied librarians
The End of Theory:
 Out with hypotheses, taxonomies,
ontologies, models
 In with statistics, correlation, patterns
 Emphasis on observation
 No libraries, but implied librarians
For the several employments and offices of
our fellows, we have twelve that sail into
foreign countries under the names of other
nations (for our own we conceal), who
bring us the books and abstracts, and
patterns of experiments of all other parts.
These we call Merchants of Light.
We have three that collect the experiments
which are in all books. These we call
We have three that collect the experiments
of all mechanical arts, and also of liberal
sciences, and also of practices which are
not brought into arts. These we call
We have three that try new experiments,
such as them-selves think good. These
we call Pioneers or Miners.
We have three that draw the experiments
of the former four into titles and tables, to
give the better light for the drawing of
observations and axioms out of them.
These we call Compilers.
We have three that bend themselves,
looking into the experiments of their
fellows, and cast about how to draw out of
them things of use and practice for man's
life and knowledge, as well for works as
for plain demonstration of causes, means
of natural divinations, and the easy and
clear discovery of the virtues and parts of
bodies. These we call Dowry-Men or
Then after divers meetings and consults of
our whole number, to consider of the
former labors and collections, we have
three that take care out of them to direct
new experiments, of a higher light, more
penetrating into nature than the former.
These we call Lamps.
We have three others that do execute the
experiments so directed, and report them.
These we call Inoculators.
Lastly, we have three that raise the former
discoveries by experiments into greater
observations, axioms, and aphorisms.
These we call Interpreters of Nature.
He was carried in a rich chariot, without wheels, litter-wise,
with two horses at either end, richly trapped in blue velvet
embroidered; and two footmen on each side in the like attire.
The chariot was all of cedar, gilt and adorned with crystal; save
that the fore end had panels of sapphires set in borders of
gold, and the hinder end the like of emeralds of the Peru color.
There was also a sun of gold, radiant upon the top, in the midst;
and on the top before a small cherub of gold, with wings
displayed. The chariot was covered with cloth-of-gold tissued
upon blue..… Behind his chariot went all the officers and
principals of the companies of the city. He sat alone, upon
cushions, of a kind of excellent plush, blue; and under his foot
curious carpets of silk of divers colors, like the Persian, but far
finer. He held up his bare hand, as he went, as blessing the
people, but in silence.
In the HathiTrust as of 11/8/2011:
 9,728,814 total volumes
 5,164,518 book titles
 256,880 serial titles
 3,405,084,900 pages
 436 terabytes
 115 miles
 7,905 tons
 2,654,933 volumes (~27%
public domain
of total) in the
… is dedicated to the provision of computational
access to a comprehensive body of published
works for scholarship and education.
Phase I, 1 July 2011 – 31 December 2012: utilize
existing tools and infrastructure to enable HTRC
functionality among partner sites (IU and NCSA).
Phase II, start date 01 January 2013: develop an
operational research center that will provide
ongoing and up to date access to the HTRC
research corpus and associated indices.
is working with a 50,000 volume
collection of materials digitized from the IU
library and a 250,000 volume collection of
non-Google digitized content.
 HTRC-Indiana
received a 3-year grant from
the Alfred P. Sloan Foundation to protoype a
system that proves experimentally and
theoretically that it is possible to comply
with the non-consumptive constraint in
computational research on copyrighted
NYPL Digital Gallery provides free and open
access to over 700,000 images digitized from the
The New York Public Library, including
illuminated manuscripts, historical maps, vintage
posters, rare prints, photographs and more.
ARTstor’s Shared Shelf has more than two
million images uploaded from 150 colleges,
universities, and museums plus ARTstor’s own
collections of more than 1.2 million images in art,
architecture, humanities, social sciences.
Stephen Downie’s “Structural Analysis of
Large Amounts of Music Information
(SALAMI)” project:
“The SALAMI project is an endeavor to use
music structure algorithms to annotate and
segment a large corpus of music (on the
order of 300,000 songs).”
--Andreas Ehmann, Mert Bay, Stephen Downie, Ichiro Fujinaga, David De Roure,
Music Structure Segmentation Algorithm Evaluation: Expanding On Mirex 2010
Analyses And Datasets. Proceedings of the 12th International Society for Music
Information Retrieval Conference (ISMIR 2011).
 The
Moving Image Archive (at,
with about 585,000 videos available,
including animation,ephemera, feature
films, and community-created video. -
 Less
accessible but more extensive: the
Vanderbilt Television News Archive,
recording and preserving daily national
network news programs since 1968.
David Rumsey’s Map Collection “has over
28,000 maps and images online. The
collection focuses on rare 18th and 19th
century North American and South American
maps and other cartographic materials.
Historic maps of the World, Europe, Asia, and
Africa are also represented.”
The Australian National Library makes
available 10,200 digitized maps of Australia,
dating from 1541 to 1954.
 The
National Archives has a digitization
strategy that includes non-exclusive
partnerships with commercial services
like (“the web’s premier
collection of [about 79 million] original
military records”) and
(“the world’s largest online family history
resource” with about 7 billion records
from around the world).
 We
do need non-profit educational
institutions to manage repositories of
cultural heritage data, but that data will
often be produced in public/private
 As
these digital cultural heritage
collections grow large, we will need
computational methods to do meaningful
work with them.
 It’s
a mistake to pit libraries against
digital humanities centers in a contest for
local institutional resources. Libraries do
need to collaborate to create shared
cyberinfrastructure but they also need to
be part of the local support provided for
researchers who are trying to use that
An age of (data) abundance presents real
opportunities for librarians, IT professionals,
information scientists and, most of all, for
humanities scholars who can harness
computational methods.
To make the most of those opportunities, we
need to think about new functional roles in
collaborative research, and our thinking
shouldn’t be limited by organizational
histories and preconceptions.