John Unsworth Wisconsin Distinguished Lecture November 9, 2011 1. Libraries, researchers, and the battle for institutional resources 1. Francis Bacon’s New Atlantis and reimagining roles in the research university 1. Why big data calls for digital humanities with Bacon (with some examples) http://www.clir.org/pubs/abstract/pub143abst.html A digital humanities center is an entity where new media and technologies are used for humanities-based research, teaching, and intellectual engagement and experimentation. The goals of the center are to further humanities scholarship, create new forms of knowledge, and explore technology's impact on humanities-based disciplines. DHCs can be grouped into two general categories: Center focused: Centers organized around a physical location, with many diverse projects, programs, and activities undertaken by faculty, researchers, and students. These centers offer a wide array of resources to diverse audiences. Most DHCs operate under this model. Resource focused: Centers organized around a primary resource, located in a virtual space, that serve a specific group of members. All programs and products flow from the resource, and individual and institutional members help sustain the resource by providing content, labor, or other support services. Of late, there is a growing interest in fostering greater communication among centers to leverage their numbers for advocacy efforts. However, few DHCs have considered whether an unfettered proliferation of individual centers is an appropriate model for advancing humanities scholarship. Indeed, some features in the current landscape of centers may inadvertently hinder wider research and scholarship. These include … The silo-like nature of current centers is creating untethered digital production that is detrimental to the needs of humanities scholarship. Today's centers favor individual projects that address specialized research interests. These projects are rarely integrated into larger digital resources that would make them more widely known and available for the research community. As a result, they receive little exposure outside their center and are at greater risk of being orphaned over time. The independent nature of existing centers does not effectively leverage resources community-wide. Centers have overlapping agendas and activities, particularly in training, digitization of collections, and metadata development. Redundant activities across centers are an inefficient use of the scarce resources available to the humanities community. Large-scale, coordinated efforts to address the "big" issues in building a humanities cyberinfrastructure, such as repositories that enable long-term access to the centers' digital production, are missing from the current landscape. Collaborations among existing centers are small and focus on individual partner interests; they do not scale up to address community-wide needs. When one is investigating collaborative models for humanities scholarship, the sciences offer a useful framework. Large-scale collaborations in the sciences have been the subject of research that examines the organizational structures and behaviors of these entities and identifies the criteria needed to ensure their success. The humanities should look to this work in planning its own strategies for regional or national models of collaboration. The New Atlantis: Inventing the research university Experiments without hypotheses Emphasis on observation No libraries, but implied librarians The End of Theory: Out with hypotheses, taxonomies, ontologies, models In with statistics, correlation, patterns Emphasis on observation No libraries, but implied librarians For the several employments and offices of our fellows, we have twelve that sail into foreign countries under the names of other nations (for our own we conceal), who bring us the books and abstracts, and patterns of experiments of all other parts. These we call Merchants of Light. We have three that collect the experiments which are in all books. These we call Depredators. We have three that collect the experiments of all mechanical arts, and also of liberal sciences, and also of practices which are not brought into arts. These we call Mystery-Men. We have three that try new experiments, such as them-selves think good. These we call Pioneers or Miners. We have three that draw the experiments of the former four into titles and tables, to give the better light for the drawing of observations and axioms out of them. These we call Compilers. We have three that bend themselves, looking into the experiments of their fellows, and cast about how to draw out of them things of use and practice for man's life and knowledge, as well for works as for plain demonstration of causes, means of natural divinations, and the easy and clear discovery of the virtues and parts of bodies. These we call Dowry-Men or Benefactors. Then after divers meetings and consults of our whole number, to consider of the former labors and collections, we have three that take care out of them to direct new experiments, of a higher light, more penetrating into nature than the former. These we call Lamps. We have three others that do execute the experiments so directed, and report them. These we call Inoculators. Lastly, we have three that raise the former discoveries by experiments into greater observations, axioms, and aphorisms. These we call Interpreters of Nature. He was carried in a rich chariot, without wheels, litter-wise, with two horses at either end, richly trapped in blue velvet embroidered; and two footmen on each side in the like attire. The chariot was all of cedar, gilt and adorned with crystal; save that the fore end had panels of sapphires set in borders of gold, and the hinder end the like of emeralds of the Peru color. There was also a sun of gold, radiant upon the top, in the midst; and on the top before a small cherub of gold, with wings displayed. The chariot was covered with cloth-of-gold tissued upon blue..… Behind his chariot went all the officers and principals of the companies of the city. He sat alone, upon cushions, of a kind of excellent plush, blue; and under his foot curious carpets of silk of divers colors, like the Persian, but far finer. He held up his bare hand, as he went, as blessing the people, but in silence. In the HathiTrust as of 11/8/2011: 9,728,814 total volumes 5,164,518 book titles 256,880 serial titles 3,405,084,900 pages 436 terabytes 115 miles 7,905 tons 2,654,933 volumes (~27% public domain -- http://www.hathitrust.org/ of total) in the … is dedicated to the provision of computational access to a comprehensive body of published works for scholarship and education. Phase I, 1 July 2011 – 31 December 2012: utilize existing tools and infrastructure to enable HTRC functionality among partner sites (IU and NCSA). Phase II, start date 01 January 2013: develop an operational research center that will provide ongoing and up to date access to the HTRC research corpus and associated indices. HTRC is working with a 50,000 volume collection of materials digitized from the IU library and a 250,000 volume collection of non-Google digitized content. HTRC-Indiana received a 3-year grant from the Alfred P. Sloan Foundation to protoype a system that proves experimentally and theoretically that it is possible to comply with the non-consumptive constraint in computational research on copyrighted materials. NYPL Digital Gallery provides free and open access to over 700,000 images digitized from the The New York Public Library, including illuminated manuscripts, historical maps, vintage posters, rare prints, photographs and more. -- http://digitalgallery.nypl.org/nypldigital/index.cfm ARTstor’s Shared Shelf has more than two million images uploaded from 150 colleges, universities, and museums plus ARTstor’s own collections of more than 1.2 million images in art, architecture, humanities, social sciences. --http://www.artstor.org/shared-shelf/s-html/shared-shelf-home.shtml Stephen Downie’s “Structural Analysis of Large Amounts of Music Information (SALAMI)” project: “The SALAMI project is an endeavor to use music structure algorithms to annotate and segment a large corpus of music (on the order of 300,000 songs).” --Andreas Ehmann, Mert Bay, Stephen Downie, Ichiro Fujinaga, David De Roure, Music Structure Segmentation Algorithm Evaluation: Expanding On Mirex 2010 Analyses And Datasets. Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011). The Moving Image Archive (at archive.org), with about 585,000 videos available, including animation,ephemera, feature films, and community-created video. -http://www.archive.org/details/movies Less accessible but more extensive: the Vanderbilt Television News Archive, recording and preserving daily national network news programs since 1968. --http://tvnews.vanderbilt.edu/ David Rumsey’s Map Collection “has over 28,000 maps and images online. The collection focuses on rare 18th and 19th century North American and South American maps and other cartographic materials. Historic maps of the World, Europe, Asia, and Africa are also represented.” --http://www.davidrumsey.com/ The Australian National Library makes available 10,200 digitized maps of Australia, dating from 1541 to 1954. --http://www.nla.gov.au/digicoll/maps.html The National Archives has a digitization strategy that includes non-exclusive partnerships with commercial services like fold3.com (“the web’s premier collection of [about 79 million] original military records”) and ancestry.com (“the world’s largest online family history resource” with about 7 billion records from around the world). We do need non-profit educational institutions to manage repositories of cultural heritage data, but that data will often be produced in public/private partnerships. As these digital cultural heritage collections grow large, we will need computational methods to do meaningful work with them. It’s a mistake to pit libraries against digital humanities centers in a contest for local institutional resources. Libraries do need to collaborate to create shared cyberinfrastructure but they also need to be part of the local support provided for researchers who are trying to use that cyberinfrastructure. An age of (data) abundance presents real opportunities for librarians, IT professionals, information scientists and, most of all, for humanities scholars who can harness computational methods. To make the most of those opportunities, we need to think about new functional roles in collaborative research, and our thinking shouldn’t be limited by organizational histories and preconceptions.