Report on the Digital Humanities 2011 conference (DH2011) General Information Digital Humanities 2011, the annual international conference of the Alliance of Digital Humanities Organisations (ADHO), was hosted by Stanford University Library from 19th to 23rd June. The conference featured opening and closing plenary talks given by David Rumsey and JB Michel & Erez Lieberman-Aiden respectively, the Zampolli Prize Lecture given by Chad Gaffield, three days of academic programme with four strands each, a large poster session, as well as a complementary social programme. The highlights of the latter included a conference banquet at the Computer History Museum in Mountain View, and excursions to the Silicon Valley, the Sonoma Wine Country, and Literary San Francisco. The conference website is at <https://dh2011.stanford.edu/> from which the conference abstracts can be downloaded. The conference’s twitter hashtag is #dh11. The following is a summary of the sessions attended. Opening keynote “Reading Historical Maps Digitally: How Spatial Technologies Can Enable Close, Distant and Dynamic Interpretations” - David Rumsey, Cartography Associates David Rumsey noted that the conference programme featured a number of spatial indicators in many of the presentations, which demonstrates a cross-disciplinary interest in spatial dimensions and associated topics, such as exploration and visualization. Maps are extremely complex spatial constructs that combine text and visualization, and which challenge us as particularly dense information systems. A multitude of visualization approaches and tools are used to “read” and unlock more removed historical maps. Maps contain historical, cultural, and political information, but are also objects of art with their own visual idiosyncracies and complexities. Historical maps have been intensively used in teaching and scholarship for a long time, the large number of requests for reproductions Rumsey’s website <http://www.davidrumsey.com/> receives testifies to the interest from a multitude of subjects. Reading both the image and the text using digital tools is key to understanding historical maps. These can be close, distant, and dynamic readings. For the purpose of close reading, scholars have used a variety of tools as well as facsimiles and annotations as visualizations of aspects of maps for centuries. Modern technology can help us to visualize and layer these notes more effectively. Distant reading is facilitated by zooming out of individual maps and focussing on the multitude of representations of the same geospatial zone, e.g. by looking at hundreds of thumbnails at once as facilitated by the Luna Image library. Luna provides access to a large maps database, offers advanced image compression, quick overviews, collection building, atlases, full views of maps, and improved image manipulation. The coverage of individual maps can be visualized on top of larger maps to give a quick overview for a large number of maps. At the same time, additional serendipity is offered via the website’s ticker feature with a random selection of maps with just a minimal metadata record. Dynamic reading, finally, happens in GIS information systems, which facilitate three-dimensional visualizations. GIS 1 tools for historical maps that enable geo-referencing are key to reading historical maps dynamically. Thus older maps can be displayed alongside newer maps over long periods of time, altitudes can be highlighted in 3D, maps can be joined together to reflect political, cultural, or historical areas. Precise overlaying of maps is another powerful tool for researchers that helps with analysis and interpretation. In the near future, the automated pixel-based extraction of information from maps, the OCR’ing of maps, will be a powerful tool to help identify places, produce indexes, normalize information on disparate maps and align information with historical gazetteers. Dynamic reading thus combines close and distant reading, overlays help the thinking process, and 3D helps with a visually centred interpretation of historical maps. Virtual worlds also offer additional possibilities, e.g. SecondLife or OpenSim, which offer potentially infinite three-dimensional space that can be used to lay out large maps, or to fly through the large areas covered by maps. They also allow for the creation of a “tower” of thumbnails that offers a quick overview of large amounts of maps. However, some maps are not visualizations of spaces, instead they offer symbolic gateways to space: to be able to visualize these symbolic information systems is a fascinating new avenue and new ways of understanding may thus evolve. The challenges ahead include effective patternrecognition in maps, the move of GIS to Web-based services, and the potential chances and challenges of crowd-sourced GIS. Session one “Layer upon layer: computational ‘archaeology’ in 15th century Middle Dutch historiography” – Rombert Stapel, Fryske Akademy (KNAW) / Leiden University, Netherlands Texts in the middle ages are characterized by a number of interfering features, such as transmission through copyists. Few original works have actually been handed down. It is difficult for scholars to “read through” these distancing layers, but some progress can be made with computational techniques. This paper focuses on the Teutonic Order’s chronicles, Croniken van der Duytscher Oirden, a late 15th century chronicle on the history of the Teutonic order. The chronicles cover the time from biblical origins to the thirteen years’ war against the Polish King and Prussian orders. A new manuscript has recently been identified at Vienna, and it is considered an extremely rare autograph of Hendrik Gerardz Van Vianen, who is the author of a relatively large amount of texts and a well documented figure. An interesting question is whether he was the author or a compiler of different sources, although in the Middle Ages this distinction didn’t exist or was not strictly observed. Both roles frequently exist in one text. We know that the prologue makes a claim to have been written by the Bishop of Paderborn who was present at the foundation of the order, although this is unlikely. Similarly, the end is likely to have been added by a different scribe. Original composition is therefore a difficult concept, as is the “intended audience”. Traditional methods have helped to identify Hendrik as one of the authors, but for some parts this is more difficult to determine. The use of computational techniques, in the form of free, easy-to-use and quick-to-learn tools, offers a new avenue into the text. Burrows’ Delta method of stylistic difference, a leading method of authorship attribution based on word frequency analysis, has been used for this purpose. The base text was encoded in TEI/XML, which will also be used for creating an edition of the text. For the testing of the Delta method, the narrative was sampled according to the expectations and knowledge 2 of the text and then tested according to those hermeneutic assumptions. The Delta method was found to be working well with a set of samples that was to be verified. Following this successful run, the method was then set up and launched on different primary samples. It was verified that parts of the prologue are indeed not in Hendrik’s hand, and also that the ending is not in the original writer’s hand. As a result, it can be summarized that both the prologue and the ending must have pre-existed the narrative of Hendrik in the Croniken. Both texts must have existed in the near temporal and spatial vicinity of the original composition and have been absorbed into the Croniken. “The UCLA Encyclopedia of Egyptology: Lessons Learned” - Willeke Wendrich, UCLA The UCLA Encyclopedia of Egyptology is a 5-year, NEH-funded project that is producing a resource, primarily intended for undergraduate students, that helps with finding and using quality scholarly resources in the field of Egyptology. The resource exists in two versions: an open one, published in the UCLA Escholarship online system <http://escholarship.org/uc/nelc_uee>, which contains all the in-depth information about the subject, and a full version, which in addition includes marked-up texts, maps, and many advanced features <http://www.uee.ucla.edu/>. The full version offers a universal system of accessing information about Egyptology, the discipline, and its constituting scholarship. Each article has a map associated with it that shows the places mentioned and any images available. The map can also be used as a browsing tool to get into the subject. It always links back to the scholarly resources available for a particular place or region. The project really comprises two projects: an editorial project and an online publishing project. Google Docs is used as a common editing tool for articles, Skype is used for meetings, and a review component is also available for the whole acceptance process for new resources: everything is peer-reviewed. For the open version the workflow involves the submission of an article, its review, revision, copyediting and finally online publication. For the full version of the Website, each article is also encoded in TEI/XML and is made available in additional ways, e.g. via the map. The Encyclopedia's bibliography and place name database are the biggest constituent parts of the project. The full version of the project also makes regions, sites, and features available for browsing. Additionally, there are three timelines, which record the creation, modification, and occupation of places or regions. There have been several lessons learned: sustainability is a huge issue for a large project like the Encyclopedia both in terms of preservation and of access. Financial sustainability is also a major issue. The open version is free, however, the full version will be subscription-based to cover the ongoing costs of development, the editorial process, markup and metadata, and the use of Google maps, as well as marketing and subscriptions. A number of possible models are currently being discussed, such as trying to establish a permanent endowment, a subscription-based model, or advertisements as a revenue source. “Possible Worlds: Authorial markup and digital scholarship” - Julia Flanders, WWP, Brown University There has always been a conceptual tension between two traditions of markup: editorial markup vs. the markup of meaning (authorial markup). These traditions have different textual commitments and enact different relationships between the text, the encoder, and the reader. 3 Mimetic (editorial) markup is about getting the text as an artefact to the reader. The authorial markup model is more concerned with transporting interpretation as a scholarly tool. The performative quality of markup is highlighted as a way of conveying meaning. The rhetoric of authorial markup, its transactions between the encoder and his reader, revolves around the communication of the encoder’s ideas and beliefs about a text as a means of interpretative work, for which the text is primarily the carrier and reference. In the TEI/XML context, @ana and @interp are important attributes for enacting this performance. TEI/XML is here used as an authoring system. The authorial markup tradition often cites a journal article encoded in TEI/XML as an example, where markup brings textual features into being. Authorial markup captures the primary text and the interpretative reading. The text with authorial markup reveals the performance of the critical reading. The result is a composite text that is no longer mimetic, but performative and dynamic. Nuanced semantic differences are thus revealed: markup becomes a world-constructing means of expression. This is a departure from the claim of mimetic markup to truth. Meaning is no longer in the text alone, it evolves with the performance of the authorial markup, in the reading of the text itself, not in its materiality. Markup expresses the critical idea, an interpretative space for the encoder to play out his performance. This type of markup enables speculative approaches to the encoding of text, e.g. encoding a narrative passage as a poem. Taking away expectations and presumptions may allow us to see different feature patterns in texts. This is sometimes called counter-factual or “possible worlds” markup. The editorial and critical debate surrounding a text is another form of markup, e.g the genetic encoding developments in the TEI community, which allows for a discussion in the markup of the text. Genre in this form of markup is sometimes reconsidered. Texts might be formalized in completely different ways based on our readings of the text. There are three important discussions: firstly, suggestions of a reading brings a world into being; secondly, it creates a conceptual space for the performance, e.g. certain readings might be elevated to schemas away from the actual text, which becomes an instantiation; thirdly, ODDs might be seen as a way of discussing the relationships between different schemas and the ways they can work on the texts that have been included as instantiations of these schemas. The problems/questions remaining are: can “possible-worlds” markup express the freedom it seeks without sacrificing the truth-value of these “possible worlds”; how can meaning in the markup be conveyed to the reader; how can readings be made more accessible to the reader as possible interpretations of the text? Session two “Expressive power of markup languages and graph structures” - Yves Marcoux, Université de Montréal, Canada This paper addresses the possibilities of overcoming certain shortcomings of XML, such as non-hierarchical (overlapping) structures, to increase its expressive powers for use in the digital humanities. The structure of marked-up documents can be represented in the form of graphs (DOMs). An XML document conforms to the subclass of graphs known as trees. There is thus a perfect correspondence in the sense that any tree is an XML document. It is clear, however, that we need more complex structures than trees. Information is not always hierarchical: the verse and sentence structures in a poem, for example, do not necessarily nest properly and to host both overlapping structures in the same document is not currently possible 4 in pure XML. The same is true for speech and line structures, which often requires recording discontinuity. The general problem is the application of multiple structures to the same content. While these are problems that cannot be easily expressed in XML, they also make it possible to think of different ways of encoding the same instantiation. There have been different proposed solutions: stay in XML, but manage issues in XML applications, e.g. TEI, or extend XML itself as in TexMECS, an XML-like markup language allowing overlapping elements and other constructs. The presenter has investigated the options offered solely by the overlap mechanism of TexMECS, i.e. overlap-only TexMECS or OO-TexMECS, which accommodates documents which allow for multiple parenthood and do not require a total ordering on leaf nodes. This only slightly extended version of TexMECS makes it possible to express child-ordered directed graphs (CODGs) that have only got a single root, but is unable to express multiple-root ones. Thus is has been shown that just adding overlap to XML doesn't allow for the more complex scenarios and structures which prompted the investigation into a more expressive form of XML. Some future work will include finding optimal verification of serializations of documents into OO-TexMECS, optimal serialization of CODGs, graphs with partially ordered children, other constructs of TexMECS not investigated here. “Mining language resources from institutional repositories” - Gary F. Simons, SIL International and Graduate Institute of Applied Linguistics Language resources (texts, recordings, research, dictionaries, grammars, tools, standards) are fundamentally important for any linguist, but they are not always easily discoverable or accessible. This paper presents work done in the context of the Open Language Archives Community (OLAC) <http://www.language-archives.org>, which was founded in 2000. The “OLAC: Accessing the Worlds’ Language Resources” project has produced metadata usage guidelines, an infrastructure of text mining methods, and a set of best practices for the language corpora creating community. It has created the OLAC Language Resource Catalogue, which makes the world’s language resources discoverable. The problem is that there is no conventional search mechanism that can find all the language resources out there, particularly those hidden in the deep web or resources whose languages are not uniquely identified by names. One systematic solution is to focus on accessing institutional repositories, in which universities now preserve their language resources. The method employed consists of finding a description of a language resource, extract the languages represented therein, and use harvesting to retrieve the metadata about these resources. MALLET (MAchine Learning for LanguagE Toolkit) was used to train a classifier to identify the language resources, the sample set was taken from the LoC catalogue, then a Python function was used to extract the names of the languages using the ISO 639-3 codes from the LoC subject headings, and normalize these lists using controlled vocabularies and stop lists. The OAI harvester was given 459 resulting URLs, and the harvest yielded 5 million DC metadata records, which were then piped through the classifier and extractor, which left roughly 70,000 potential language resources. But the question remained which of these should be entered into the OLAC catalogue? An additional language code identification filter was run on the results, which left about 23,000 records that were considered genuine language resources. This automated process resulted in a 79% accuracy rate of the algorithm for language extraction, and 72% precision for language identification. However, a number of known problems remain: non-English metadata, names used as adjectives of ethnicity of place, language names as place names, missing words from stop lists, incorrect weighting heuristic, incomplete metadata records where languages are not 5 explicitly identified. As a result of the work, 22,165 language resources were identified with acceptable precision, but more work to tweak algorithms remains to be done. “Integration of Distributed Text Resources by Using Schema Matching Techniques” - Thomas Eckart, Natural Language Processing Group, Institute of Computer Science, University of Leipzig This investigation builds on the text mining project eAQUA (2008-11) of ancient Greek and Latin resources. While the importance of the use of standards has long been recognized in the community, there is still a lot of variety of data types, editorial decisions and encoding solutions, and almost one third of the original project’s resources was devoted to this initial part. Heterogeneity is a huge problem and often a result of distributed teams working on data with different research focus, different skills sets, and different tools. In addition, there is heterogeneity of data models (XML, databases), technical and infrastructure heterogeneity, as well as semantic heterogeneity. The common approach to solving these issues is of course the use of standards, e.g. TEI/XML, DocBOOK, ePUB, TCF, but frequently the need to create extensions creates dialects of your chosen standard. The solution this paper proposes is schema matching. It consists of schema mapping (correspondences of elements in two schemata) and the automated detection, or schema matching, of these correspondences. This approach is often used in scenarios of merging large DBMS. The methods used include profiles (pairwise comparison of elements), features (name similarity, path similarity), instance-based features, and distribution-based features. The corpora used for the work were different versions of the Duke Databank of Documentary Papyri (DDoDP) in TEI/XML, its EpiDoc-encoded equivalent (Epiduke) and an extraction from the latter stored in a flat relational schema. To find corresponding elements, the techniques of fingerprinting of elements, pairwise linking, and scoring according to similarity were used. All results were normalized to the interval [0,1], where '0' corresponds to no similarity and '1' to identity. The various approaches have shown that especially semantic approaches are promising for identifying similar elements. For cases with very little semantic overlap, structural analysis can also be taken into account, but have proven only to be valuable where complex structures exist. Future work will include the automatic determination of weights, text normalization strategies, analysis of micro structures (key words, NER, chunking). New use cases are also required. Session three “gMan: Creating general-purpose virtual environments for (digital) archival research” Mark Hedges, Centre for e-Research, King’s College London Project gMan <http://gman.cerch.kcl.ac.uk/> is a JISC-funded project, part of their VRE programme, which builds on initial work done by DARIAH. The context for the project is that the increasingly data-driven humanities has led to the development of a variety of VREs for a number of purposes, specific disciplines, or activities. gMan investigates the possibility of a more general purpose built system that supports day-to-day activities (simple generic actions, complex processes). The aim of gMan is to build a framework for generic VREs, to support research built up from generic actions. Built on the gCube development platform <http://www.gcube-system.org/>, content and content model primitives are expressed in RDF. 6 Thus it enables both the creation of a VRE management framework and a Virtual Organization (VO) model. Features of the system include the ability to import data and to work on it collaboratively. It offers tools as well as data. As an evaluation exercise of its key functionalities, the system has been tasked with assembling virtual collections, searching across collections, adding annotations, adding links between objects, searching annotations, and sharing materials with colleagues. Some of the challenges were that the data import procedures were created by the gCube team, and adaptations between the data models were necessary, which is not really scalable. The infrastructure is based on EU project funding and is therefore not sustainable. A humanities VO needs its own sustainable infrastructure. The current infrastructure is really a mix of services that are provided by a number of different entities and sustained in different ways, this is something that needs to be addressed in the future. The system has its first application in the EHRI European Holocaust Research Infrastructure project, which has a large number of fragmented and dispersed archival sources, and draws on DARIAH curation services for their collection. gCube offers a working research environment, not a publication framework, but it offers an insight into the research process, allows for provenance and justification, and can enhance publication. “Opening the Gates: A New Model for Edition Production in a Time of Collaboration” Meagan B. Timney, Electronic Textual Cultures Laboratory, University of Victoria Digital literary studies have long developed a system of working with texts digitally and in a digital editions context. Electronic digital editions require a revisit in light of the new collaborative nature of the Web. Editions used to rely on the expertise of a single person or small group of people. In the digital world this limitation appears artificial and unnecessary. Instead we need a dynamic edition model for representing text, a combination of text and tools that offers a dynamic interface to the edition. Hypertextual editions offer the easy connection of disparate resources, but presuppose a large library and vast full-text base. These two models need to be united in a scholarly edition. The scholarly primitives developed by John Unsworth and others have been taken as the basis for the construction of the functionalities of the dynamic digital edition. This new type of edition challenges the authority of the editor of traditional archives and editions, and the social dynamics create a social edition that is collaborative and co-creative. The work of the editor is then to curate the contributions of the individual contributors. Many of the Web 2.0 labels can be transferred to social editions, from incompleteness to open source. The paratext instead of the text thus becomes the focal point. The community of users is the underlying authority of the social edition. Interpretative changes based on user input are at the heart of the social edition. It prioritises fluidity over stasis. Community-driven collaborative editing is at the centre of this new departure into a social digital humanities. “When to ask for help: Evaluating projects for crowdsourcing” - Peter Organisciak, University of Illinois Crowdsourcing represents collaborative work broken down into little tasks. The central question is when is crowdsourcing appropriate? How do you entice a crowd to care? How can crowdsourcing be utilised in the set framework of a research project. The presenter has investigated a sample of 300 sites that carry the term crowdsourcing in delicious tags and 7 employ the method for their projects. Common methods employed include encoding aggregation (perception-based tasks utilizing human capacity for abstraction and reasoning), knowledge aggregation (utilizing what people know, whether facts or experiences), and skills aggregation. The primary motivators identified have been interest in the topic, ease of entry and participation, altruism and meaningful contribution, sincerity, appeal to knowledge, and money. Secondary motivators are any indicators of progress and one's own contribution as well as positive system acknowledgement and feedback. Future work needs to take into account that the barriers to crowdsourcing are falling, and that it is becoming easier, therefore more specific investigations into academic projects are required. However, crowdsourcing is also under some criticism as being “unscholarly”, ethnically questionable, misusing funding resources, and distastefully publicity-seeking. Session four “Evaluating Digital Scholarship: A Case Study in the Field of Literature” - Susan Schreibman, Digital Humanities Observatory; Laura Mandell The evaluation of digital scholarship has been a central, yet vexing issue for a long time, and the topic of many articles and discussions. The key is not simply evaluating but valuing digital scholarship. Its legitimacy and contribution to the field must be acknowledged. Unfortunately, digital scholarship has coincided with strains on humanities funding and struggling academic presses. The many digital outputs of the digital humanities are not fitted to print publication and are often going unnoticed and unrewarded. Digital scholarship is also collaborative by nature, and there is little facility to recognize these collaborations and to reward them. Digital outputs have also often been delegated to a secondary form after the print publication, a mere adjunct. Many projects in the digital humanities that provide a service are also less frequently acknowledged. Archives are often dismissed as what librarians do, programming as what technologists do, user education as what teachers do. Research is not seen as embedded in these endeavours. So it’s about how we define research in our area: the many various expressions of digital scholarship need to be taken up by evaluation bodies and recognized as scholarly activities and outputs. The humanities as a discipline have to broaden traditional concepts of scholarship to recognise these new developments. The presenters have created a wiki to collect these points. There was also a workshop around the new ways the digital humanities work. The start was the evaluation of digital editions, one of the longest traditions in the digital humanities, but again editions, print or digital, are not considered as scholarship. Technologies in the digital humanities are perceived as jargon. What is vital is a departure from print-based peer-review systems. However, we cling onto them for lack of a better solution. Online publication is now so easy that everything can be online. In traditional thinking, the quality is only visible in the publishing body behind the publication, e.g. an academic press. There is no question: digital scholarship must be evaluated, scrutinized, and pass the requirements of scholarship, but traditional evaluation systems fail to see beyond the prestige of the presses. Some first steps have been made: an NEH-funded NINES workshop has produced guidelines on how to evaluate digital scholarship, another workshop to follow. Authorship in the context of collborations is still one of the central problems and will be addressed specifically in these reports. The hope is to develop new modes of rewarding the hard work that goes into creating digital resources whose impact goes far beyond their own field. 8 “Modes of Composition in Three Authors” - David L. Hoover, New York University This paper investigates the question of how mode of composition affects literary style. It investigates three writers who changed their mode of composition, from handwriting to dictation (and back), either temporarily or permanently, namely Thomas Hardy, Joseph Conrad, and Walter Scott. These are cases in which the details of composition are well known and in which the changes take place within a single text. Hardy's novel A Laodicean is a good example as the change in mode occurred after the first three instalments of the novel and switched back in the final sections. The change of mode from handwriting to dication was caused by an illness which required him to lie on his back and from which he only slowly recovered. Word frequency analysis is used as a way of analysing stylometry. It became clear very quickly that there is no fundamental stylistic difference between the modes of composition, and a variety of different stylistic methods (mean word length and mean sentence length) verify the observation. While the employed techniques are able to find subtle changes, this is not based on the mode of composition as assumed in Hardy. Style is dictated by the narrative structure, the progression of the story, not mode of composition. Conrad dictated parts of three of his works, The End of the Tether, The Shadow-Line, and The Rescue. As with Hardy, though the reason for the switch of mode was a different one in each case, there is little evidence from the stylometric analysis of any change attributable to the change of mode of composition. Again narrative structure is a much more powerful influence. Finally, the same can be observed in Walter Scott's novels The Bride of Lammermoor and Ivanhoe. More investigations into other forms of mode changes, such as typewriting and word-processing, will be necessary before any generalizations are possible. “Names in Novels: an Experiment in Computational Stylistics” - Karina van Dalen-Oskam, Huygens Institute for the History of the Netherlands - KNAW, The Hague This paper presents work made possible by the increasing availability of digital texts and tools in the field of literary onomastics, the study of the usage and functions of names in literary texts. The aim is to compare the usage and functions of proper names in literary texts between oeuvres, over time, and places. The use of a quantitative approach is helpful to discover what is really happening in novels and leads to new questions. Named entity recognition and classification (NERC) is used as a proven method for this study. The corpus is comprised of 44 English and 20 Dutch novels. The focus is on novels published in the last 20 years. There are several levels of encoding in name usage: tokens, lemmas (normalized forms), mentions (one or more tokens), and entities (with one or more different lemmas); in name categories: personal names, geographical names; and in reference types: plot-internal names, and plot-external names. Not much difference could be found between the two language corpora with regard to names. Genre plays a bigger part, e.g. children's books have a lot more named entities than non-children's books. There was also a difference between originals and translations, which led to a difference in both tokens and mentions. An example is Bakker's Boven is het still and its translation The Twin: there are not many names in the novel, and the translator has not added any names. A comparison of lemmas reveals that this novel has, unusually, more geographical names than personal names, and closer investigation shows that travel plays a special part in the novel. Explicitly formulated travel routes explain the territorial taboos of the main character. The geographic names thus reveal an important plot point. One important outcome of the project is the importance of genre. Children's books form a distinct group among the 9 novels investigated. The challenges for such work are that there are still not many novels available in digital form, tokenization is chaotic when used across languages, NERC tools do not really speed up the work or enhance the quality, and the drawback of focussing on only one level of encoding is inconsistent results when applied to only one novel vs. a corpus. What is needed is a comprehensive textual work environment, a combination of concordancing and tagging options, and dynamic self-learning NERCs, as well as options for statistical analysis. Session five “The Interface of the Collection” panel session - Geoffrey Rockwell, University of Alberta, and members of the INKE project Geoffrey Rockwell introduced the panel in which members of the Interface Design team of the Implementing New Knowledge Environments (INKE) project discussed the manner in which interfaces are influenced by the structure of the materials included, by the history of traditions of representing collections, and by the intended use and needs of the users. It will address questions such as how have interfaces changed with the move to digital; how do we interact with interfaces; how do interfaces determine our engagement with the artefact? As the corpus is the body of the scholarly web, information is channelled through interfaces. Interfaces mediate content and filter information. “The Citation from Print to the Web” - Daniel Sondheim, University of Alberta For scholars, citations are an important interface feature and the multitude of citation design patterns from earliest times is testimony to their importance. These patterns are: absence, juxtaposition, the canonical citation, the footnote, and the citation of other media. The design of the citation tells how we conceive of the topology of knowledge and the relationships between its constituting sources. The canonical citation is integrated into the text, it is a link to an external work that is not dependent on a particular source edition. Canonical citations on the Web exist e.g. in the Canonical Text Services (CTS) Website. Juxtapositions are inline citations that are more clearly distinct from the text than a canonical citation, they are highlighting the differences between the body of the text and the annotations, yet also maintain a closeness between them as they appear usually in the same place. Marginalia are a different kind of juxtaposition, even more closely connected to the main body of the text. Footnotes are a type of elsewhere notes, where a symbol is used to connect the body to the annotation. On the Web often icons are used to highlight these connections, like the camera icon on Google Earth. These three types of citations highlight the relationships between different types of knowledge. “The Paper Drill” - Stan Ruecker, University of Alberta The presenter demonstrated a new experimental interface, The Paper Drill, for navigating collections of articles in the humanities through citations. Scholars often use citations in research in the form of “chaining”, i.e. the use of citations as ways of expanding one’s bibliography for a new, yet unfamiliar, field of research. The hope is to come up with an automated process of creating these chains of references (“citation trails”) to establish a good set of relevant articles for a particular topic. The idea is to find a seed article that will contain a number of top articles that get cited over and over again as relevant and are cited themselves as 10 relevant one level down. We can then identify the authors of the top articles which gives an additional avenue into subject areas. The interface presents the most frequently cited articles in the form of heatmaps, arranged according to date range and journal category. “Diachronic View on Digital Collections Interfaces” - Mihaela Ilovan, University of Alberta Interfaces to digital corpora make the collections manageable and usable. Their evolution over time offers an interesting insight into the developments of both technology and design. The criteria for this investigation have been age of the resources, complexity, versions management, media implementations, and academic involvement. The study looked at Project Gutenberg, Perseus Digital Library, and the Victorian Web. The inherent limitations of the study are a scarcity of reliable screen captures, and sparse bibliography on the history of interface developments. Interface design is influenced by a number of factors, such as technology, users, and discourses. Web technology has most notably changed in the screen resolution available, bandwidth limitations, as well as formats and file sizes, which all ultimately influence design choices. Users influence interfaces through their developing expectations through experiences from interacting with many collections, thus supporting standardizations of terminologies and layout choices. The ontological discourse is also a notable influence, e.g. in the project vs. subsequent digital library stages, which coincided with a professionalization of bibliographic metadata. This historic approach to design studies offers valuable insights into the evolution of design choices, but is influenced by factors not always intrinsic to the project, such as having to tell success stories. “The Corpus from Print to Web” - Geoffrey Rockwell, University of Alberta This paper investigates the design and evolution of corpus interfaces by comparing features of two domains, namely print and web. The study focuses on three types of corpora: linguistic, literary, and artefactual corpora. Linguistic corpora are very much defined by the user community of linguists. They offer many options for refined searches and complex functionalities. Linguistic corpora in print do exist. They offer a streamlined view of the data, with the usual table of contents, indices, glossaries, abbreviations as possible avenues into the collection. On the Web, a complex browsing/search box usually replaces the table of contents and indices. The search very much defines the way into the collection. In artefactual corpora the limitations of print often force authors to split up their materials and to make choices about the physical organisation of the materials. On the Web the organisation is less based on physical organisation, but is search based and offers a variety of detail from simple thumbnails to rich metadata-centred overviews. Printed literary corpora equally feature table of contents, bibliography, glossary, errata, as typical organisational features. On the Web search is the dominant feature, along with browsing well-established features, such as authors, works, timelines, and places. To conclude, in the transition from print to web the following can be observed: the introduction of automated search, a shift from narrative to database, different views when decoupled from physical arrangements, new modes of interaction (dynamic views, tours), and a loss of authorial control/interpretation (still visible in essays, tours etc.). 11 Session six “Topic Modeling Historical Sources: Analyzing the Diary of Martha Ballard” - Cameron Blevins, Stanford University Historian Laurel Ulrich's A Midwife's Tale is the starting point for this study, which instead of a traditional close reading of the diary of the 18th-century midwife Martha Ballard, uses topic modelling to mine a digitized transcription of the diary. The diary covers 27 years, from 1785 to 1812, and contains nearly 10,000 entries, a total of 425,000 words. This case study uses the MALLET toolkit for finding topics. While diaries are exceptionally rich historical resources, they are also extremely challenging, often fragmented, like court records, newspapers, letters, accounting ledgers etc. The quality of data is a huge problem. Particularly in diaries, the inconsistent spellings (e.g. the word “daughter” is spelled fourteen different ways), abbreviations and contractions in shorthand, all add to making the text difficult to read for computers. Topic modelling, a method of computational linguistics that attempts to group words together based on their appearance in the text, can transcend this messiness. MALLET identified thirty topics, which were then manually labelled. By applying the modelled topics to each diary entry separately, it was possible to chart the behaviour of certain topics over time. Daily scores for certain topics reveal interesting patterns for the yearly distribution of certain activities, such as gardening, knitting, midwifery, but also reveal interesting patterns over the whole 27 years covered by the diary. Quantification and visualization of these patterns is the real benefit of this method, identifying patterns sometimes not immediately visible in close readings. Even messy sources are becoming manageable with this approach. “An Ontological View of Canonical Citations” - Matteo Romanello, King's College London This paper examines the use of canonical citations to Classical (Greek and Latin) texts by scholars. Canonical citations provide an abstract reference scheme for texts similar to how coordinates express references to places. The paper wants to make use of this long established practice, and through extracting citations from scholarly papers create a domain ontology for Classics. Canonical citations in Classics very much reflect an ontological view of texts, particularly how classicists perceive ancient texts as objects. This work has resulted in the Humanities Citation Ontology (HuCit), an ontology of the semantics of citations in humanistic disciplines. Canonical citations are particularly useful in this respect as they are used for the purposes of precision of identification, persistence of reference, and interoperability across the domain. Citations of Aristotle have been chosen as an application, mainly because of the reliability of the Bekker numbers, an early important editor of Aristotles' works, which is used by all Classicists. FRBRoo is used as the underlying model for the ontology. It is the result of a harmonization of FRBR with the CIDOC-CRM. Canonical citations are thus considered as resolvable pointers in any expression/manifestation of a work. This allows the definition of types of references and the examination of alternative representations of the same reference, and is meant to support interoperability of tools that are currently being developed to extract, retrieve, and resolve canonical references. This work builds on related work such as CiTO, OpenCyc, and SWAN Ecosystem, and the work of CTS (Harvard) and CWKB (Cornell). 12 “Victorian Women Writers Project Revived: A Case Study in Sustainability” - Michelle Dalmau and Angela Courtney, Indiana University Libraries The Victorian Women Writers project started in 1995 as a full-text corpus, which was no longer being added to by 2003, was then revisited in 2007, and finally relaunched in 2010. On the level of content, the process involved the migration of old content to TEI P5, and on a structural and institutional level in firmly embedding the project into the University's Digital Humanities curriculum. The work very much depended on local and faculty expertise, on the exploration and evaluation of projects, and experimentation with tools. One way of reviving the VWW project was to use it in teaching and research, thus to build up a pool of expertise and encoders, and to draw on expertise in the English department. This integration into the curriculum has resulted in a number of student projects in text encoding, producing secondary contextualizing materials, and exploring Digital Humanities tools and resources, or in creating an online exhibit. The act of marking up texts has also inspired closer and new readings of texts. Students understood the values behind and the principles and philosophy of structural and semantic encoding. This work also led to the development of a prosopography, to the addition of an annotations facility and the creation of critical introductions. Scholarly encoding cultivates technical skills and refines critical thinking and interpretation in students. Evaluation of the class by students was positive and collaboration was appreciated. Continuing student involvement is one of the many positive outcomes. “The Cultural Impact of New Media on American Literary Writing: Refining a Conceptual Framework” - Stephen Paling, School of Library and Information Studies, University of Wisconsin-Madison The goal of this investigation is to extend Social Informatics to the study of literature/arts by conducting a broad-based survey of the American literary community, writers, publishers, journalists. Four key values were identified as the conceptual framework: positive regard for symbolic capital, negative regard for immediate financial gain, positive regard for autonomy, and positive regard for fresh, innovative work or work only possible electronically (avantgardeism). The study focuses on the emergence of new forms of literary expression offered by information technology and on whether these forms are able to establish themselves alongside traditional forms of expression in American literary writing. The main research questions are: is the American literary community showing positive regard for this new literary innovation, and do they show support for the use of technology in creating these innovations. A survey was sent out to 900 members of the Association of Writers and Writing Programs, the MLA, as well as representatives of publishers. There were 400 exclusively national respondents. The results show that the vast majority support innovation and original works. However, when technology comes into play these figures plummet dramatically. Generally print-based output is regarded as having much higher quality and impact than online publications. To conclude, it can be summarized that there is generally support for a positive regard of these innovative new forms, but intensive use of technology is not really either well received as a single mode of publication or indeed much produced. Only about 10% of the American literary community demonstrates any intensifying use of technology. There is little evidence of a move of electronic literature out into the mainstream either from a producers' or a publishers' perspective. 13 Zampolli Lecture “Re-Imagining Scholarship in the Digital Age” - Chad Gaffield, Professor of History at the University of Ottawa, President of the Social Science and Humanities Research Council of Canada Re-imagining scholarship is prompted by a new era of scholarship, which draws upon our past but embraces the technologies of today, particularly digital humanities technologies. Scholarship in the digital age is changing and shaped by the work done in the Digital Humanities today. Deep conceptual changes are underway in humanities scholarship and these require a redefinition of our field and the methodologies we embrace. Zampolli pioneered the collaborative aspect of digital scholarship, and created the community that we build upon today. The interconnectedness of research is really the key realization in this dynamic field of exploration. Michael B Katz has been influential in the exploration of the use of computers in the 70s and 80s. Later the emergence of groups devoted to textual interpretation emerged and put their mark on the discipline. Our focus needs to be on the new ways of thinking that are made possible through the use of modern technology. Canada has been innovative in the creation of a research infrastructure for the country that was qualitatively and quantitatively reflective of the state of the nation and the needs of the community. Computer-based analyses of long-term social change in Canada has a long and fruitful tradition. Beginning in the 70s and 80s, based on decennial census data starting in 1871, political debates, and public discussions, historical and social research have been influenced by this rich data source. Census data was digitized, OCR’d, marked up and has been partially made available online (1911 and some samples from earlier census, some data from 1971 onwards). A number of digital projects have drawn on this rich data source, among them database projects such as the Canadian Social History Project, the Vancouver Island Project, the Lower Manhattan Project, and the Canadian Families Project. To conclude, there are deep conceptual changes afoot that reflect new forms of complexity, diversity, and creativity in a technology-driven age. Digital Humanities are ideally placed to tackle some of the more complex problems, to embrace diversity and make it workable, which empowers a larger group of people to be creative in new and exciting ways. Re-imagining scholarship is re-imagining teaching by fostering learning and re-imagining research by broadening horizons and embracing collaboration. The challenges remaining are bridging the solitudes of arts/humanities, ensuring digital sustainability, managing open innovation and intellectual property, measuring what matters, and giving acknowledgement to the contributions of the digital humanities. Session seven “Moving Beyond Anecdotal History” - Fred Gibbs, George Mason University Walter Houghton's seminal work The Victorian Frame of Mind, 1830-1870 has influenced generations of scholars of the nineteenth century and remains the primary introduction to Victorian thought for students today. Houghton's identification of Victorian traits such as optimism, hero worship, and earnestness, based on the use of particular words, has been influential if not wholly uncritically received. Houghton bases his findings on examining hundreds of primary sources of the period, but despite criticisms of anecdotal truths based on elite intellectual history, Victorianists have been unable to thoroughly asses the validity of the 14 assertions or to offer an alternative view. This paper hopes to examine Victorian history via lots of books, thousands of books, instead of just “literature”, by using the resources made available through Google Books. Methodologies are important when answering basic questions and challenging assumptions based on a too limited number of sources. There is however a clear tension between rhetoric and practice: the rhetoric emphasises lots of data, tools for everything, visualize everything, explore, and offer new interpretations; in practice, we deal with messy data, an underdeveloped understanding of text vs. data, difficult complex tools, and opaque (even if pretty) visualizations. Further criticisms can be added: bias, sampling problems, unclear significance, and lacunae with unknown consequences. This paper attempts a solution by using and exposing simple techniques (scripting and queries), facilitating not-sodistant reading, and active, contextualized engagement with texts. The approach is to study the use of keywords commonly associated with the period and apply it to the titles of books published between 1790 and 1910. Are there peaks observable in the literature of the period, e.g. “revolution”, “heroic”, “faith”, “commerce”, “science” as terms frequently associated with the 19th century? The Google Books Ngram viewer <http://ngrams.googlelabs.com/> was used initially, but found to be too simplistic, so Amazon's S3, Elastic MapReduce and Hive services <http://aws.amazon.com/elasticmapreduce/> were used to do some more detailed explorations, as they offer more flexibility, are cheaper than developing it yourself, and have produced tried and tested results. The result has been that there is more than one answer to setting up queries and visualizing any results, which are all helpful and offer more thorough insights into the new questions we are interested in, e.g. "science of" as a term reveals the fundamental rise of science in the 19th century as a ubiquitous term as opposed to the term “science” as we understand it today. As the study only focused on titles, it has been impossible make any generalizations, but analysis of the full-text will be the next step. Often results have been found to be confusing and incomplete, but it is important to highlight the inconsistencies along with the new and exciting questions we are able to ask as a result. There is also a danger of concentrating on deliverables, an emphasis on novelty and production. Glossing over inconsistencies or rationalizing them away poses the huge danger of falling back to the assumptions we are trying to question. To conclude, big data does not require complex tools, and makes the transitions from text to data and back transparent. “Towards a Narrative GIS” - John McIntosh, University of Oklahoma Narratives are frequently examined through a time-centric approach, as a series of events. Space is less frequently considered in narrative construction and analysis. This NSF-grant project defines a geospatial narrative as a sequence of events in which space and time are of equal importance. The objectives are to automate collection of space-time events from digitized documents, to develop a data model that can support event based queries, that can be linked to narratives, and that can support queries based on content. The result is a visualization of the geospatial narrative. The conceptual model is event-based (action, actors, objects, location, time), these events are combined to form narratives. The data sources are two distinctive corpuses of histories of the Civil War era, Dyer's Compendium of the War of the Rebellion and the Richmond Daily Dispatch. The process involves the automated extraction of information into a database, the source materials are digitized, tokenized, identified events are extracted along with locations, subject-objects, and time references. Automation is based on evaluating parts of speech: “action verbs” and “event nouns” are treated as atomic events, nouns are always important, including proper nouns, time- and space nouns. The project uses 15 Python, the Natural Language Toolkit (NLTK), tokenize, and a part of speech tagger as its workflow. For the purpose of toponym resolution, it seeks to match place names to thesauri to identify correct real-world references. The main challenges were ambiguity, metonymy, and a lack of comprehensive historical gazetteers. Named entity recognition and classification is used to group consecutive proper nouns, excluding named entities in non-spatial phrases and stop lists. Gazetteer Matching is employed to resolve multiple matches through spatial minimizations. For the purpose of evaluation, a sample from each dataset is gone through manually and the result of automated processes is compared with hand mark up. The project's outputs include the development of a relational database application that allows users to query the database themselves, to query for new event types, and to construct queries for event chains. The project has produced code to extract narrative building blocks, a database for storage, and a query interface for that purpose that will support the ability to query large datasets for atomic building blocks and to combine them into narratives. “Civil War Washington: An Experiment in Freedom, Integration, and Constraint” - Liz Lorang, University of Nebraska-Lincoln Civil War Washington is a thematic research collection that explores the historical transformation of Washington, D.C. during the Civil War, when it emerged as a harbour for freed and runaway slaves and was to lead the nation from slavery to freedom and equality for all. The project is related to the Walt Whitman Archive, and was originally conceived of as an exploration into Whitman and Lincoln in Washington. The dramatic events surrounding the city in the four years of the Civil War are documented in a vast amount of documents. Assembling and digitizing these is only one aspect, the other is analysis, visualization, and tools-building to make the city come alive. The amount of materials is huge, the encoding challenging and time-consuming, yet it is a pre-requisite for the work undertaken. All data is stored in a relational database to enable connecting up all the different types of documents, people, events, places, institutions, etc. A public interface is currently under development. Technical infrastructure and data models are undergoing revisions all the time and are part of the research process on the topic that should be made transparent just as any other assumption. Visualization in the form of a GIS application is another part of the project. The project uses ArcGIS, which has been found to be easiest to adapt and allows for more analytical investigations of the data, but as a proprietary product it has its own challenges and limitations. However, compromises will always be necessary due to limitations of desktop vs. Web applications. Humanists should be more involved in the open GIS development of tools to make them more appropriate to the field and for the many uses in digital scholarship. Civil War Washington does not want to be a neatly packed up product, instead it wants to stimulate research, be experimental, transparent in its assumptions and limitations. The interrelatedness of all the data is really the key complexity, challenge, and opportunity. Session eight “Googling Ancient Places” - Elton Barker, Open University, et al. The Google Ancient Places (GAP) project aims to computationally identify places referenced in scholarly books and be a means to deploy the results via simple Web services. The project, 16 funded by the Google Digital Humanities Award programme, is based on Google Books. It facilitates access to ancient places in Google Books, and then adds information about the places to the texts. GAP builds on the AHRC-funded HESTIA project, but instead of relying on heavily marked-up text provided by the Perseus Digital Library, employs computational methods as well as additional open infrastructure in the form of new semantic gazetteers like GeoNames and the Pleiades Project. GAP also draws on the resources of Open Context, an open-access archaeological data publication system. These tools enable the geo-tagging of the text and referencing the places in the Google Books corpus via easily resolvable public identifiers. The project is currently in a proof-of-concept stage. Data and technology are freely available and are documented in a blog <http://googleancientplaces.wordpress.com>. The Pleiades gazetteer of ancient places is improved upon by adding modern forms of places found in the Google Books literature. Visualizations have been produced on Google maps, among them a visualization of books projected on maps of the Ancient world. GAP makes intensive use of the already existing digital humanities infrastructure and must be seen in the context of collaboration with a number of projects who are all working on similar projects and problems. “Image Markup Tool 2.0” - Martin Holmes, Humanities Computing Media Centre, University of Victoria The presenter demonstrated the re-developed Image Markup Tool. Version 1 of the software has been widely used, even in a few major projects. Its simple to use interface and full TEIawareness have made it a popular tool in the digital humanities community, but version 1 only handles 1 image per file, it only allows rectangular zones, only a one-to-one relationship between zones and divs, it essentially constructs the TEI document for you, and is Windowsbased. The improvements in version 2 address these issues. Flexible linking has been added to allow for many divs to one zone, and one div to many zones relationships. Version 2 is also written in C++ and built on the QT framework, and therefore cross-platform and open-source (Sourceforge). The presenter gave a demo of the interface and functionality of the new version. The new linking mechanism will be based on the TEI linkgrp mechanism and will use the back of the document for that purpose. The desktop version is seen as preferable to server-based solutions due to network latency, the power of a native application in C++, the familiar interface components based on platform, and independence from a server infrastructure. “Lurking in Museums: In Support of Passive Participation” - Susana Smith Bautista, University of Southern California Lurking is a term popularized with the advent of online communities as a mode of passive participation, e.g. listening, watching, reading, attending, etc. The museums community has a vital interest in communities around their museums that reflect visitor participation. It is important to recognize that all types of participation are valuable for the constitution of a certain domain, community or particular project. The key is understanding, motivating and analysing participation. In the world of museums, lurking is important because it is a principal mode of interaction. Museum websites have long embraced the Web2.0 community-based support systems, and increasingly it has been recognized that lurking is only initially a passive form of participation, it can change into activity at any time when prompted or motivated sufficiently. Lurkers are also not necessarily loners, there are whole communities that consist 17 of a majority of lurkers and still constitute an important aspect of the whole. If we define knowledge as a process, then actions as part of this process can be internal or external, both equally valuable. Lurkers provide an audience, they contribute to the social ecosystem through their mere presence. Lurking is clearly not appreciated when participation is required but not given. Social and peer pressures are mechanisms that sometimes make lurking difficult. However, lurking is not opposed to participation, indeed museums must encourage it to be able to nourish, motivate, and transform it into interactivity when required. “RELIGO: A Relationship System” - Nuria Rodriguez, University of Málaga (Spain) and Dianella Lombardini, Scuola Normale Superiore (Italy) RELIGO is a relationship system designed to express the nature and characteristics of relations between digital objects hosted in digital libraries and text archives, it is a tool for interpretative study. These relationships can lead to new insights and the creation of knowledge and thus deserve to be treated as research objects in themselves. RELIGO is a system designed to construct interpretations based on these significant relationships, it relates texts, concepts, words, and visual artefacts. RELIGO relates these entities on two logical levels, the expression, i.e. the digital objects, and the semantics, i.e. digital concepts, that allow the interpretation to be built. Therefore a digital concept can itself become a digital object when it is the subject of interpretation. RELIGIO makes these connections searchable and browsable, they are represented in the form of navigable graphs, interpretations can be reconstructed by simply following the paths between the digital objects. Documents can be imported in PDF, images as JPG, concepts as TopicMaps, relationships as ontologies, digital objects are represented in XML/SVG. Ongoing work involves metadata management, export and viewing functionalities, a Web based version, and content sharing and social tagging, as well as management of more object types such as audio and video. “Omeka in the Classroom: The Challenges of Teaching Material Culture in a Digital World” – Allison Marsh, University of South Carolina It is surprisingly difficult to convince faculty and students of the values of digital research. The digital world, though part of their lives as “digital natives”, is often not considered part of their professional lives. In the museum studies community and teaching programmes, material culture in the digital world remains a challenge. The presentation reported on the experiences had with the Omeka open-source software <http://omeka.org/> as a teaching tool for graduate students. The software is used due to its low use barriers and as it allows for adding content and interpretations. Students of the course are required to produce exhibits using the system, add images and DC metadata, organize materials and construct a narrative of the materials. Online exhibits are an excellent way of training the curation of material objects in the digital world and are thus a useful tool in the curriculum. While the results of students’ efforts are often lacking in both content and presentation, Omeka has proved to be an excellent pedagogical tool, as using the software has made students aware of the many challenges involved in representing material culture online. One outcome of the experience of teaching the course will be some improvements to the Omeka software. 18 Closing keynote “Culturomics: Quantitative Analysis of Culture Using Millions of Digitized Books” – JB Michel & Erez Lieberman-Aiden, Cultural Observatory, Harvard University In a library we can either read a few books, carefully, or read all the books, not very carefully. This presentation aims to offer a quantitative exploration of cultural trends, focussing on linguistic and cultural changes reflected in the English language between 1800 and 2000. How does culture change? How does language change? An example is irregular verbs that have regularized over time. The approach was to investigate grammar books as a starting point to trace the regularization of these verbs. It was discovered that many of the 177 irregular verbs present in Early Modern English got regularized. In particular the rarely used ones got regularized more quickly than often used ones. How can we measure cultural trends more generally? First, we need the world’s largest corpus of digitized books, so a sample of 5 million Google books (with good quality metadata) was used out of the 15 million Google books corpus. The aim is to automate the process of analysis of cultural phenomena over time. N-gram analysis, based on the Google Books Ngram viewer, is used as a method for the investigation. Many built-in controls were necessary to verify the quality of the dataset, which was then taken as the data source. A lot of the time the so-called Scientific Method involves trial and error. There are lots of problems with certain n-grams, particularly in pre-1800 books, bad dates, OCR errors, noise, bias of the corpus, random rubbish are just a few of the issues. But sometimes you can find examples which are worthy of further attention and quantification and sometimes reveal interesting results. Censorship is an extremely interesting case, e.g. during the Nazi regime certain names of artists etc. get blocked out completely while in other countries they develop in the literature as you would expect. The same can be observed during the McCarthy era. The Google Books Ngram viewer was created to make the data and their analysis available to the public, transparent and open. There are even spin off projects like an Ngram viewer for musical notes. We are thus on a good way to a fully-fledged Culturomics: we can create huge datasets, we can digitize every text written before 1900, we should create high-quality images of art works, we need to track cultural changes on the Web, we must make everything available to everybody. We will also need to work together to make all of this work, in large teams distributed all over the world. We need to embrace the expertise of the sciences and use their infrastructure and resources for the humanities. We need to embrace expertise wherever we can find it, from how to read texts to how to interpret our observations. We need to learn to interpret data (from scientists), and to interpret texts (from humanists). We also need to teach humanities students to code and to be quantitatively rigorous in their approach. Culturomics thus extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities. It was also announced that next year’s Digital Humanities conference will be hosted by the University of Hamburg and the 2013 conference will be hosted by the University of NebraskaLincoln. 12/07/2011 Alexander Huber, University of Oxford <http://users.ox.ac.uk/~bodl0153/> 19