Opening Keynote
TELDAP International Conference 2012
Taipei, Taiwan
February 21, 2012
Donald J. Waters
Program Officer
Scholarly Communications and Information Technology
The Andrew W. Mellon Foundation
This paper briefly describes examples of large-scale digital projects in three areas of humanistic study: Classical, Medieval and Renaissance and Early Modern studies. The projects illustrate the ways in which the application of technology and new divisions of labor among institutions and individuals affect the deployment and creative use of primary sources in these three fields. The paper then highlights some of the larger implications of these projects for scholarly communications, particularly in the requirements for systems of data curation in the humanities, particular forms of publication, and new kinds of infrastructure.
Introduction
The Taiwan E-Learning and Digital Archive Project, or TELDAP, has ambitious objectives to: showcase the cultural, social and biological diversity of Taiwan; promote the application of technologies and digital content; establish digital archives and elearning industries, and encourage international collaboration. In the context of these objectives, my task in this paper is to launch the 2012 TELDAP international conference by addressing its three-part theme: Convergence, Collaboration, and Creativity.
Although one can readily nod in agreement that these concepts are important in principle to the success of academic projects involving digital technologies, we all also know that they are elusive in practice. Many digital projects promise to bring convergence to areas of study and to important academic resources that previously have been disconnected. However, we all can identify projects in which decisions about direction, software, user interfaces, or digital content have generated more heavily barricaded silos than those that existed previously.
Technology-enabled projects also open many possibilities for various and potentially fruitful collaborations between scholars and technologists, and among scholars who have previously been separated by disciplinary focus, distance, language, and other seemingly impermeable barriers. However, we also all know that technology is no guarantee that collaborations will actually multiply and bear fruit. Indeed, one of the biggest barriers to collaboration is the not-invented-here syndrome: the conviction that
“you and I will collaborate just fine if you adopt my system and abandon yours.”
1
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 2
As for the third ‘C’, Creativity, we can very quickly find ourselves in murky territory.
The technologically-disposed among us often exude confidence that the latest tool or system can foster the creation of new knowledge that was previously impossible to grasp or with heretofore unmatched ease and efficiency. Examples of these positive effects are easy to find but so are those examples of promising technologies that had little effect on creativity and died for lack of use. In a recent book called Reinventing Knowledge:
From Alexandria to the Internet, historians Ian McNeely and Lisa Wolverton wisely caution that ‘innovation in technology does little in itself to guarantee the progress of knowledge as a whole’. Focusing on the Internet, they write: ‘We risk committing a serious error by thinking that cheap information made universally available through electronic media fulfills the requirements of a democratic society for organized knowledge’.
1
The reason that the relationship between technology and human creativity is not as predictable as we might like is that the relationship itself is complex and subject to a host of other variables. For example, technology is only one factor to consider when we are thinking about human creativity. Among the other factors, according to McNeely and Wolverton, are the ‘the ways in which we organize intellectual activity’. The authors survey the history of several such modes of organization, including the library, the university, the so-called republic of letters, academies, and the laboratory, and conclude that none of these represents a ‘culmination or pinnacle’ of human efforts to advance knowledge. Instead, the history of knowledge creation is ‘discontinuous, full of paths not taken’. As McNeely and Wolverton explain, ‘students in Europe and India once gravitated to verbal disputation, not passive listening to a lecture, as the heart of learning’. In addition, gentlemen in Europe and China, who were the precursors to today’s citizen scientists, ‘once carved leisure time out of busy lives full of social obligations in order to carry on academic traditions. [And] scientists in Europe and
Islam once saw the mastery and manipulation of nature as complementary, not antagonistic, to religious and humanistic understanding’.
2
Given these various cautions, we must therefore approach the thematic concerns of this conference with a spirit of skepticism and humility. The topics are big and complex. It is difficult and requires much hard work sustained over time to achieve convergence, promote collaboration, and stimulate creativity in e-learning projects and when building and preserving content in the humanities, social sciences and sciences. There are no easy answers, but finding some answers is the challenge that the conference organizers have put before us. So let me begin with a brief description of three projects with which
I have been fortunate to have been involved over the last decade. They are beginning to converge in interesting ways, bearing the fruits of certain kinds of collaboration and some very creative thinking.
Three Case Studies
The projects on which I want to focus attention are drawn from three areas of humanistic study: classical, medieval, and renaissance and early modern studies. My
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 3 involvement in these initiatives is as a funder. The Andrew W. Mellon Foundation, for which I work, is a grant-making organization in the US. It directs its support primarily to the humanities in higher education and to the performing arts. I am responsible for the grant-making area that covers scholarly communications and information technology. We provide support for research libraries and scholarly publishers, particularly university presses, but we also work directly with scholars in particular fields of study as they build digital content and tools to use in their research and teaching.
3
In the scholarly communications program, the Mellon Foundation does not support projects that serve the interests only of one or two scholars. Instead, we seek out those scholars who are reimagining the research and teaching agenda in their fields and thinking broadly about how digital content and tools would support a community of scholars working together to pursue that agenda. Our objective is not massive digitization, nor the development of basic repository and search and discovery technologies. Our collective understanding of these issues is more or less well advanced.
Much less is understood about how scholars would use the digital content and tools to advance knowledge in their fields and what the requirements of such use are, especially for ongoing curation, publishing, and shared infrastructure.
The scholarly resources in the three projects that I describe below are primarily textual in nature. However, many of the issues raised apply to visual, audio, moving image and other types of collections. In the discussion that follows the case studies, I briefly examine some of the implications for these other kinds of resources.
1. Classical Studies: Integrating Digital Papyrology
The first project that I want to highlight is in the field of classics and is called Integrating
Digital Papyrology.
4 Classical Studies is the branch of the humanities that seeks to understand the culture and society of the ancient Mediterranean world from the Bronze
Age (approximately 3000 BCE) until Late Antiquity (ending about 600 CE). A relatively small but important field of classical studies is papyrology, which focuses on the social and cultural documentation about the ancient Mediterranean that survives on papyri.
As I have already indicated, digital technology makes it possible for scholars to bring together related scholarly resources online, where the whole becomes more powerful and efficient than the sum of the separate parts. However, institutional and personal egos and other factors often get in the way of promising mergers, and they have done so until recently in the field of papyrology, where three complementary and overlapping online databases have remained stubbornly apart.
Established thirty years ago, the oldest of the three digital databases is the Duke
Databank of Documentary Papyri. It is maintained at Duke University in the US and contains online transcriptions of Greek and Latin documentary (but not literary) papyri.
For essentially the same set of papyri, the University of Heidelberg in Germany manages the HGV, Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden
Ägyptens, a database of detailed bibliographic description of the texts, but not the texts themselves. The third system is APIS, the Advanced Papyrological Information System,
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 4 which is based at Columbia University and allows scholars to search and display digital images of papyri and related metadata.
About eight years ago, representatives of each of these resources finally agreed to unite the papyrological images, transcriptions, and bibliographic information from the three databases into a common online system. These databases have now all been encoded using a common eXtensible Markup Language (XML) standard called EpiDoc, which was originally developed for the related field of epigraphy. Commonly encoded, they are accessible and cross-searchable through a newly developed shared interface called the
Papyrological Navigator.
As one can imagine, as soon as scholars started viewing transcriptions and detailed bibliographic descriptions together and in a context where they could also see images of the original papyri, they were annoyed by errors and began to contest the transcriptions.
The project team responded by developing an editing environment based on an existing editor developed for scholars working on the text of the Suda, a 10th-century encyclopedia. Using this new editor—called Son of Suda online (SoSol)—registered users are now able to submit corrections, alternate readings, and translations of papyrological fragments as well as new, previously unpublished fragments. An editorial board of papyrologists vets these submissions and authorizes special “super-editors” to make approved changes to the databases. Meticulously recording each step, the editing system makes it possible for contributions to receive attribution, and for contributors to be cited and accorded credit at a fine-grained level.
Enthusiasm for the new integrated system and editing environment has been so great among papyrologists that even before the official release of the editing system, scholars who attended training workshops and tested the tool had entered approximately 350 new texts and suggest emendations to many previously published papyri. Several classicists in Europe and North America are now using SoSol in graduate seminars, and a team in Berlin is using the system to transcribe and edit 6,000 digitized papyri, including 2,000 unpublished documents. Additional technical improvements are now underway including: enhanced mechanisms for reporting data-entry errors; extended capabilities for a critical apparatus that permits users to note alternate readings and the decisions of previous editors; and new workflows and interfaces that would allow a discrete subgroup, such as a graduate seminar, to use the system for the collaborative editing of a group of papyri. The resulting edition would then be vetted by members of the subgroup before being submitted to the editorial board. The project team is also exploring how the editing platform could be used for papyri in other languages such as
Arabic, and for ancient texts other than papyri.
2. Medieval Studies: The “Shared Canvas”
Allow me now to turn to the field of Medieval Studies. One of the primary sources on which medievalists rely is collections of manuscripts produced mainly in European monasteries between approximately 600 CE and 1500 CE. Several hundred thousand manuscripts are known to have survived. These are widely dispersed, and with Mellon
Foundation support scholars have digitized several important collections.
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 5
One of these is the Parker collection. Matthew Parker (1504-1575), the Archbishop of
Canterbury under Elizabeth I, was one of the great figures of Christian humanism in
Renaissance England. He assembled a collection of 546 manuscripts in the middle of the 16th century from various Catholic monasteries that were closing under pressure from the Church of England. The collection has remained intact in the Parker Library at
Corpus Christi College at the University of Cambridge, and includes some of the oldest works in the English language, plus many in Latin and Anglo-Saxon, some of the oldest examples of English art, and other documentation fundamental to the history and identity of the English-speaking peoples. Cambridge, Corpus Christi, and Stanford
University collaborated to digitize all the volumes in the Parker collection and to make them available online in the Parker Library on the Web.
5
The second is a collection of variants of a work called the Roman de la Rose. Written by two authors at different times during the 13th century, the Roman de la Rose represented a new form of literary romance and challenged prevailing attitudes about how poets could talk about love in literature. The vivid language of the poem prompted the production of hundreds of manuscripts of the Rose, most of them richly illuminated, from the 13th to the early-16th century. More than 250 Rose manuscripts survive today in several languages, but they are widely dispersed, making scholarly comparisons among different versions very difficult. Led by scholars at the Johns Hopkins University and the Bibliothèque nationale de France (BnF), and with the help of collaborators at a variety of other institutions, the Rose project team has created a digital library of 149 of the 250 extant manuscripts.
6
A third collection has been assembled by the e-codices project in Switzerland. An objective of this project is to produce high-quality digital versions of all surviving manuscripts in Switzerland, but a core subset of this collection is a digital reconstruction of the ninth-century library holdings that represent the intellectual culture of the monastic society that produced and drew inspiration from the famous Plan of St. Gall.
Containing idealized instructions for the arrangement of a monastery according to
Benedictine principles, the plan is the most detailed and comprehensive architectural manuscript from the Middle Ages. In collaboration with the Abbey Library of St. Gall, the Medieval Institute at the University of Fribourg has created, maintains, and makes accessible for free on the Internet within the larger the e-codices database digital versions of more than 155 of the 355 manuscripts known to have been in the Abbey
Library prior to 1000 C.E. These manuscripts include, besides the Plan itself, an original copy of the Rule of St. Benedict, which probably was written in St. Gall around
820; the oldest German book, written around 790; and the work of Notker the German, a St. Gall monk and noted scholar who translated and provided commentary on
Boethius’ The Consolation of Philosophy. Equally significant are the music manuscripts, the liturgical and legal codices, and twelve manuscripts that deal exclusively with medical subject matter.
7
It will surely come as no surprise that each of the Parker, Rose, and e-codices projects has developed its own set of applications and tools for searching, viewing, and analyzing its own corpus of manuscripts. They quickly became silos of the classic kind, and as the
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 6 remarkable materials of each digital library has come increasingly to the attention of other medievalists, the pressure has mounted to make the systems interoperable and easily usable by as many scholars and students as possible in a wide range of intellectual pursuits. An initial attempt at interoperability failed when the parties fell into the notinvented-here trap of each trying to persuade the other to adopt all or parts of its system. A more workable solution recently emerged when the collaborating technologists, scholars, and librarians agreed to make a rigorous conceptual distinction between the digitized manuscripts (images and metadata) and the applications and tools that are built to display and study them. Given this distinction, they then constructed a standard Web-based service interface through which, on the one hand, repositories could deliver their manuscript images and metadata and, on the other, scholars could access these data, using whatever applications and tools they preferred as long as they have incorporated the standard interface.
A key component of the proposed Web-based interface is a ‘resource map’, or what is now called a ‘shared canvas’. The ‘shared canvas’ is a standard method based on the
Linked Data principles of the Semantic Web by which repositories describe the names, order, and structure of each of its manuscript files. Developers have encoded and tested the protocols by which repositories present the ‘shared canvas’ to scholars and then deliver specific manuscript pages for use with specific tools, such as viewers with page turning capabilities and more complicated editing, transcription, and annotation systems.
8 One tool that incorporates the ‘shared canvas’ model and is an advanced stage of development is called Transcription for Paleographical Editing and Notation, or T-
PEN.
9
With Mellon Foundation support, development of the ‘shared canvas’ model has proceeded in stages that are closely tied with a series of scholarly projects that are chosen both for their intellectual value and their ability to serve as so-called ‘use cases’ that exercise particular components of the model. Currently, there are nine of these use-case projects underway. Because additional repositories have signaled their interest in adopting the shared canvas protocol, another round of development and usecase projects is being planned. Moreover, following an exploratory workshop last year, plans are being organized to extend the shared canvas model to digital images other than those that represent medieval manuscripts.
10 Such an extension would be especially useful given the increasing number of image-based collections that museums, archives, and libraries are now making accessible on a number of different platforms, using a variety of different technical standards.
3. Renaissance and Early Modern Studies: Emergence of Printed Genre
The third and final project that I want to highlight comes from the field of Early Modern
Studies. In the early modern period in Europe from approximately 1470 to 1800, economic, political, and colonial developments intersected with a variety of important cultural changes, including the shifting use of vernacular languages, the spread of print, the standardization of spelling, the rise of science, the appearance of the novel, and the organization of a literary public sphere. These intersections have given rise to many large questions for traditional humanistic study about the relationships among changing
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 7 social conditions, the cultural domains in which print circulates, and the diversification of genres of discourse over time. For example, when does scientific discourse appear and from what sources? Do genres like the novel have consistent linguistic fingerprints over time? When does a recognizable American literature emerge?
Study of these questions has traditionally depended on the assembly, mastery, and characterization of relatively limited bodies of evidence, often in the form of canonical texts, such as the corpus of Shakespeare or similarly celebrated authors. However, several important developments have converged to make it possible for scholars to explore different kinds of answers. For example, systematic digitization of early printed books and other materials from the period by Google and other companies has made the evidence much more accessible. In addition, rapid improvements in optical character recognition (OCR) have made it easier to obtain machine transcriptions. Much work is still needed and already underway to improve OCR accuracy and quality still further and to develop more efficient and cost effective mechanisms for correcting errors that persist, but recent progress has been astounding.
11 Finally, new computer-based analytical tools now make it possible to examine, compare, and contrast the features of much larger corpora of cultural output from the period.
One interdisciplinary group of scholars at the University of Wisconsin at Madison has used a tool called Docuscope, which was originally developed at Carnegie Mellon
University, to apply linguistic tags automatically to the digitized and OCR-based transcriptions of early modern English language texts. They then apply to the texts the statistical techniques of principal component analysis and, using additional tools to visualize the results, identify promising hypotheses about the characteristics of the texts.
The hypotheses are tested by further statistical analysis as well as close reading and interpretation of the texts from the larger corpus that the statistical techniques suggest are related.
The Wisconsin team has initially applied these methods to re-examine the characteristics of genre in the works of Shakespeare by comparing them to a larger set of
300 other plays written during the period. The results of this research, published in part in a recent issue of the Shakespeare Quarterly, raise new questions such as: Why do so many of Shakespeare’s plays, including the tragedies and histories, cluster linguistically with the comedies of the larger group?
12 Given these provocative results, the Wisconsin team is now developing Docuscope and other analytical tools for broader use. They are testing the feasibility of applying them to corpora of significantly larger scale beginning with a corpus of 1,000 English language texts, rather than 500, and then scaling up to
10,000 texts. After evaluating the technical developments and the scholarly research results, they plan to develop a longer-term research agenda and technical development plan.
13
Implications: The Changing Ecology of Scholarly Communications
The description of these three projects in the digital humanities has been brief but, I hope, sufficient to convey the ways in which resources and technologies can converge to
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 8 good effect with persistent effort, even after previous failed attempts. It should also be evident that to produce meaningful results these initiatives have depended on rich collaboration among different institutions and individuals with various skill sets.
Finally, I hope that I have made palpable the creative energy that each is exerting in their common intellectual pursuits.
However, I think we can tease out even more significant features from these cases, because scholarly communications begins to change when scholars increasingly rely on digital sources for their research and teaching. First, the dynamic ways in which papyrologists, medievalists, and early modernists engage with the digital primary sources suggests that an emerging model of data curation may be more appropriate than the special collections model on which scholars have traditionally relied when seeking help from librarians and other information professionals about primary sources.
Second, the ways that they communicate about these sources with each other and with students and the public alters the publishing functions normally associated with academic publishers. Finally, the lessons of these cases suggest the need for key new elements in the academic infrastructure. Let me elaborate.
1. Curation
First, the case studies I have presented offer important clues to the emerging nature and function of digital data curation in the humanities. Many, if not most, humanistic disciplines and fields of study are (and have been for centuries) reliant on the intensive use of primary source information, which includes the scholarly communications functions of collecting, organizing, preserving, and enabling access to those sources. As in the sciences and social sciences, humanists, like those in the three cases I presented, now have as their task to realign the methods and resources required in the digital environment for managing primary sources and the research and teaching based on them. For libraries and archives, the protective treatment of primary sources in physical form as special collections is inappropriate for copies that are available in digital form.
As Jerry McGann, professor of English at the University of Virginia, has written, this change may prove to be a daunting task. ‘In the next 50 years,’ he argues, ‘the entirety of our inherited archive of cultural works will have to be re-edited within a network of digital storage, access, and dissemination’.
14
In the sciences and social sciences, the realignment of how custodians need to manage primary sources in the digital environment is increasingly discussed under the heading of data curation. But what does “data” mean in the context of these discussions. The phrase strongly suggests numerical and structured information. However, the temptation in the digital age is to broaden the definition to the bit level, in which everything is reduced to data in the form of zeros and ones. Given the cases I have presented, I would suggest that the concept of ‘data’ be defined not numerically and not at the bit level, but ecumenically as ‘the primary sources that fuel teaching and research’.
In the humanities, data in this sense include but are not limited to papyri, medieval manuscripts, and early printed books. They may also include art work, architectural plans and monuments, music, field notes, photographs, and video, newspapers,
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 9 television and radio broadcasts, film and film scripts, costume, and Web sites. There are surely advantages to limiting scope in any discussion of data, but scientific data now also come in many of the forms in which these humanistic data are represented and collected digitally.
Moreover, scanner, optical character recognition, and other digital conversion and capture instruments and sensors have greatly expanded the nature and scale of primary sources to be digested, managed, and analyzed in all sectors. In the sciences especially, there is a tendency to treat these data curation issues as novel and requiring a new paradigm that depends heavily on the data-mining, pattern matching and simulation capabilities of high-performance computing.
15 With all due respect to the late Jim Gray and the many admirers of his notion of a so-called ‘fourth paradigm’ for research, even if these experiences of and approaches to data are in fact new to scientists, they are all too familiar to any humanist who has tried to fathom the depths of a historical archive, comprehend the semantics of an unknown language, describe the social rituals of a foreign culture, reconstruct social practice from archaeological remains, or interpret the context of architectural production. Deep dives into formless data, pattern-matching to organize and render those data more manageable, and story-telling, which after all is at the core of all simulation, are stocks in trade of humanistic scholarship, and there are centuries of philological scholarship on the theory and method of data curation. In other words, data-driven scholarship is not such a new framework for research, at least in the humanities. In the cases that we have discussed this morning, some very traditional philological methods would be right at home in the digital world.
What does seem new in the so-called ‘fourth paradigm’, and what represents at least in part a significant challenge and opportunity is the formalization of very traditional interpretive activities of data-mining, pattern matching, and story-telling or simulation in powerful algorithms that represent large and complex sets of data in terms of multiple features and variables that can be analyzed, tested, replicated, and changed at the scale and speed afforded by advanced computation. The promise of these new automated capabilities is that new knowledge can be created in ways that were not previously possible. Projects in the humanities, particularly in text analysis, have moved the needle noticeably forward, but have highlighted data curation problems that are common in form to those being addressed across many fields and disciplines. Scholars in renaissance and early modern studies, as we have seen, have an ambitious project to understand the creation of new genres of communication in the era of the early printed book. Building on the large scale digitization efforts of Google, Proquest, and others, these scholars have generated a substantial curation agenda that is required to advance their research. The standard bibliography in the field has to be modified so that scholars can reliably define the proportion of the total corpus on which they are computing. They are defining mechanisms, which include crowdsourcing, for the correction of OCR, so that they can improve the accuracy of the digital transcription of the texts. And, like the papyrologists and medievalists, they are also defining procedures and apparatus for publishing online scholarly editions of these texts.
2. Publication
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 10
This brings me to the issue of publication. In each of the three projects that I have described, there is a careful and deliberate engagement with the technology to reexamine the time-honored and special form of academic publication known as the critical edition. Critical editions are reliable, authoritative presentations of primary source evidence. There are two types of editions well-known in the humanities: documentary and literary editions. A documentary edition generally consists of an authoritative compilation and transcription of a set of letters, manuscripts or other documents, usually of historical value. A literary edition is a type of documentary edition that presents a literary text or related set of texts. The practice of editionmaking includes methods of selection, transcription, and annotation. Annotation typically takes a highly structured form of a critical apparatus in which very detailed observations are made about the texts: editorial inclusions and exclusions are justified, the validity and provenance of the sources are attested, and detailed explanatory notes about people, places, and events mentioned in the documents are provided.
Technology makes it possible to subdivide the labor of this complex work in various interesting new ways with participants spread over time and place. For example, provided that they are subject to peer review and quality control mechanisms, students and amateurs can be involved in transcriptions and annotation. Commenting on the
Integrating Digital Papyrological initiative, Professor Roger Bagnall of New York
University has observed that new divisions of labor make it difficult to see the resource, and others like it, ‘as representing at any given moment a synthesis of fixed data directed by a central management; rather, we see it as a constantly changing set of fully open data sources governed by the scholarly community and maintained by all active scholars who care to participate. One might go so far as to say that we see this nexus of papyrological resources as ceasing to be “projects” and turning instead into a community’.
Bagnall continues: ‘Part of that evolution implies an abandonment of the distinction operative in this and other fields until now between editing texts and maintaining textual databanks’. The ability of the system to allow creation of new editions inside the editorial system means that these editions can ‘become publicly visible only when the editor chooses to make them so. Instead of someone retyping texts published in print form into the database, the boundary between scholarly creation and the database is a dynamic one controlled by individual contributors, the editing software, and the board of editors—capable of effacement at a keystroke’.
16
Academic publishing and curation systems need to change to incorporate this new model of the scholarly textual edition as dynamic textual product. Moreover, the textual edition/databank with its richly developed critical apparatus represents a type of complex mechanism that scholars have devised over time to make complex arguments by moving directly in and among and arranging the primary sources, not simply by referring to them. This edition-based, immersive model of argument is one that is now being adapted from textual sources and tried out in a variety of other fields and with other types of materials as scholars experiment with digital technologies to arrange audio and visual sources, maps, and now three-dimensional models in order to present
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 11 evidence in ways to make effective audio-visual, spatial, and kinetic arguments.
17 We are in early days in these areas of what promises to be a boom cycle of innovation.
3. Infrastructure
In most cycles of innovation, there is often the need for common, enabling technology that serves as shared infrastructure. Without this infrastructure the cycle of development may stall, and the most promising effects will be lost or minimized, rather than shared broadly. The papyrology, medieval manuscript, and early printed books projects that I have described have highlighted a shared need for at least two basic forms of shared infrastructure that need to be developed.
First, annotating is a pervasive element of scholarly practice, and in each of the three projects finding practical and transparent ways to record and credit critical notes, emendations and other work has been a persistent source of concern. Even transcription or translation may be thought of as a form of annotation on an original source. Scholars in these projects and others are increasingly complaining about the limitations of annotation tools, which do not make it easy to move either among different collections and resources of the same type or across different types of media such as text and video. They also do not permit easy sharing of annotations with other scholars, and they do not even allow individual scholars to store all of their annotations together.
In order to address these problems, an international group of computer and information scientists and humanities scholars from the University of Illinois at Urbana-Champaign,
Los Alamos National Laboratory, the University of Maryland, and the University of
Queensland in Australia formed the Open Annotation Collaboration. With Mellon
Foundation support, they have developed a set of standards operating within the context of the linked data standards of the Semantic Web to ensure more effective interactions between scholarly annotation tools and digital collections. The standards encompass annotation of any media type, and annotation of content stored locally on a personal computer as well as remotely in Web-based databases such as JSTOR or ARTstor. The
Open Annotation Consortium is currently implementing, testing, and refining standards in eight scholarly and publication initiatives in a variety of fields.
18 The National
Information Standards Organization has also recently convened several workshops to explore how these annotation standards could be even more widely adopted.
19
A second kind of enabling infrastructure that would assist in the development of these projects is what has come to be known as a virtual research environment, or VRE.
20 A
VRE provides a framework that makes it easy to stitch together standard components of conducting scholarly research in an online environment, such as interfaces for content, content management, and tools. A critical part of these interfaces is standard authentication and authorization mechanisms so that these issues do not have to be constantly addressed as scholars become more flexible and adept in online teaching and research. The Bamboo project, based at the University of California at Berkeley, is an attempt to construct the virtual research environment infrastructure that colleges and universities could broadly implement for use in the humanities.
21
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 12
Conclusion
Let me now conclude with a few final reflections. I have described three substantial projects in the humanities involving the imaginative use of digital technologies. The initiatives suggest some rather profound changes in the ways that scholars are relating to the primary sources of evidence in the fields of classical, medieval, and early modern studies. I have indicated further how these changes relate to the broader ecology in scholarly communications in which new forms of data curation, publishing, technical infrastructure are emerging.
It is important to recognize that much of the important work that is underway is methodological, requiring detailed attention to questions about how to encode or represent sources, how to design and build tools, and how to structure and evaluate analyses and their results. Some have worried that such attention to the methodological obscures the substantive scholarship, and in some case covers for the absence of any substantive scholarship at all. It is of course important to guard against extreme positions that imagine that somehow scholarship can actually exist as some kind of pure thought without the application of any method. It is also useful to pay attention to historians who have observed the ebbs and flows of methodological creativity in the development of scholarly activity.
Tom Scheinfeldt at George Mason University is one of those historians. He recently addressed the point about ebbs and flows on his blog. He wrote:
One of the things digital humanities shares with the sciences is a heavy reliance on instruments, on tools. Sometimes new tools are built to answer pre-existing questions. Sometimes…new questions and answers are the byproduct of the creation of the creation of new tools. Sometimes it takes a while, in which meantime tools themselves and the whiz-bang effects they produce must be the focus of scholarly attention. Like 18th century natural philosophers confronted with a deluge of strange new tools like microscopes, air pumps, and electrical machines, maybe we need time to articulate our digital apparatus, to produce new phenomena that we can neither anticipate nor explain immediately.
22
I submit that what I have described briefly in this paper is exactly what Scheinfeldt had in mind: the creative ‘articulation’ of digital technologies in fields of study that are fueled by crucial questions of broad interest and compelling research and teaching agendas to address those topics. I wonder if you agree.
Acknowledgements
I wish to thank my colleagues, Helen Cullyer and Philip Lewis for their helpful comments on an earlier draft of this paper.
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 13
Endnotes
1 I. F. McNeely with L. Wolverton, Reinventing Knowledge: From Alexandria to the
Internet (New York, 2008), xx.
2 McNeely, Reinventing Knowledge, xx-xxi.
3 H. Cullyer and D. J. Waters, ‘Priorities for the Scholarly Communications Program’, in The Andrew W. Mellon Foundation, Report from January 1, 2008 through
December 31, 2008 (New York, 2009), 34-51, http://www.mellon.org/news_publications/annual-reports-essays/annualreports/content2008.pdf
, last visited on 30 March 2012.
4 Integrating Digital Papyrology, http://idp.atlantides.org/trac/idp/wiki/ , last accessed 30 March 2012. See also J. Sosin, ‘Digital Papyrology’, Congress of the
International Association of Papyrologists, Geneva, 19 August 2010, http://www.stoa.org/archives/1263 , last visited on 30 March 2012; and R. S.
Bagnall, ‘Integrating Digital Papyrology’, presented at Online Humanities
Scholarship: The Shape of Things to Come, University of Virginia, March 26–28,
2010, http://archive.nyu.edu/handle/2451/29592 , last visited on 30 March 2012.
5 Parker Library on the Web, http://parkerweb.stanford.edu
, last visited on 30 March
2012.
6 Roman de la Rose Digital Library, http://romandelarose.org/ , last visited on 30
March 2012.
7 e-codices: the Virtual Manuscript Library of Switzerland, http://www.ecodices.unifr.ch/ , last visited on 30 March 2012. See also Codices Electronici
Sangallenses (CESG) – Virtual Library, http://www.cesg.unifr.ch/en/index.htm
, last visited 30 March 2012; and Carolingian Culture at Reichenau and St. Gall, http://www.stgallplan.org/index.html
, last visited on 30 March 2012.
8 SharedCanvas: A Distributed Canvas Rendered from Linked Data Annotations, http://www.shared-canvas.org/ , last visited on 30 March 2012. See also R.
Sanderson, B. Albritton, R. Schwemmer, and H. Van de Sompel, ‘SharedCanvas: A
Collaborative Model for Medieval Manuscript Layout Dissemination’, Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries, Ottawa, June 2011, http://www.shared-canvas.org/slides/docs/sdh11-paper.pdf
, last visited on 30
March 2012; and R. Sanderson and B. Albritton, ‘SharedCanvas: Collaborative
Facsimiles’, Books in Browsers 2011, San Francisco, USA, October 2011, http://www.shared-canvas.org/slides/docs/bib11-slides.pdf
, last visited on 30
March 2012.
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 14
9 T-PEN: Transcription for Paleographical and Editorial Notation, http://tpen.org/TPEN/ , last visited on 30 March 2012.
10 Tom Cramer, The International Image Interoperability Framework (IIIF): Laying the Foundation for Common Services, Integrated Resources and a Marketplace of
Tools for Scholars Worldwide, Project Briefings/Presentations, CNI Fall 2011
Membership Meeting, Arlington, VA, December 12-13, 2011, http://www.cni.org/topics/information-access-retrieval/international-imageinteroperability-framework/ , last visited on 30 March 2012.
11 OCR Summit Meeting, http://idhmc.tamu.edu/ocr-summit-meeting/ , last visited 30
March 2012.
12 J. Hope and M. Witmore, ‘The Hundredth Psalm to the Tune of “Green
Sleeves”:Digital Approaches to Shakespeare’s Language of Genre’, Shakespeare
Quarterly, 61, no. 3 (2010), 357-390. See also M. Witmore, ‘Text: A Massively
Addressable Object’, 2010, http://winedarksea.org/?p=926 , last visited on 30 March
2010.
13 See R. Valenza, ‘Visualizing English Print from 1470 to 1800’, a proposal submitted to D. J. Waters, the Andrew W. Mellon Foundation, 28 April 2011, https://mywebspace.wisc.edu/valenza/web/VisualizingEnglishPrint/VisualizingEng lishPrintUWMadisonValenzaApril28fordownload.pdf
, last visited on 30 March 2012.
14 J. McGann, ‘A Note on the Current State of Humanities Scholarship’, The Future of
Criticism -- A Critical Inquiry Symposium, Critical Inquiry 30, 2 (2004), 409-413, cited here at 410.
15 See, for example, T. Hey, S. Tansley, and K. Tolle, eds, The Fourth Paradigm: Data-
Intensive Scientific Discovery (Redmond, WA, 2009).
16 Bagnall, ‘Integrating’, 2-3.
17 For examples of visual arguments of this type, see A. Juhasz, Learning from
YouTube (Cambridge MA, 2011), http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=12596 , last visited on 30 March 2012; and M. F. Delmont, The Nicest Kids in Town: American
Bandstand, Rock 'n' Roll, and the Struggle for Civil Rights in 1950s Philadelphia, http://scalar.usc.edu/nehvectors/nicest-kids/index , last visited on 30 March 2012.
Both works are products of the Alliance for Networking Visual Culture, http://scalar.usc.edu/anvc/ , last visited on 30 March 2012.
18 Open Annotation Collaboration, http://www.openannotation.org/ , last visited on 30
March 2012.
Waters, Digital Humanities and the Changing Ecology of Scholarly Communications 15
19 E-Book Annotation Sharing and Social Reading, http://www.niso.org/topics/ccm/e-book_annotation/ , last visited on 30 March
2012.
20 See, for example, M. Fraser, ‘Virtual Research Environments: Overview and Activity’,
Ariadne, 44 (July 2005), http://www.ariadne.ac.uk/issue44/fraser/ , last visited on
30 March 2012.
21 Project Bamboo, http://www.projectbamboo.org/ , last visited on 30 March 2012.
22 T. Scheinfeldt, ‘Where’s the Beef? Does Digital Humanities Have to Answer
Questions?’ Found History, 12 May 2010, http://www.foundhistory.org/2010/05/12/wheres-the-beef-does-digitalhumanities-have-to-answer-questions/ , last visited on 30 March 2012.