Digital Humanities DH2012 - Oxford University computer users web

advertisement
Digital Humanities 2012 (DH 2012) conference report
Digital Humanities is the annual international conference of the Alliance of Digital Humanities
Organizations (ADHO). ADHO is an umbrella organization whose goals are to promote and support
digital research and teaching across arts and humanities disciplines, drawing together humanists
engaged in digital and computer-assisted research, teaching, creation, dissemination, and beyond, in
all areas reflected by its diverse membership. The 2012 conference took was hosted by the
University of Hamburg from 16th to 20th July, the conference website is at <http://www.dh2012.unihamburg.de/>.
Day 1
Opening plenary
“Dynamics and Diversity: Exploring European and Transnational Perspectives on Digital
Humanities Research Infrastructures”, Claudine Moulin, Trier Centre for Digital Humanities,
University of Trier, Germany
Drawing on her involvement in the European Science Foundation and the Trier Centre for Digital
Humanities, the presenter offered her perspectives on digital humanities research infrastructures
under four global headings: 1.setting a multi-faceted frame: DH and diversity of research; 2.
fostering the diversity of languages, methods and interdisciplinary approaches; 3. one more turn:
changing research evaluation and publication cultures; 4. interaction of digitality and DH in works of
arts. Heading 1.: The ESF has formed a standing committee for the humanities (SCH) and has
recently published an ESF science policy briefing on research infrastructures in the digital
humanities. This publication was addressed to researchers as well as funding bodies, policy makers,
and key stakeholders. The report identifies key needs of and challenges for practitioners in the field.
A divergence from the natural sciences research environment is noted. Humanities researchers have
long used research infrastructures (RI), starting with the Museion in the 3rd century BC, the first
known information centre. Since then, museums, libraries and archives have continued the task of
providing such an infrastructure for researchers. RIs need to encompass both physical and
intellectual networks in humanities research, there are four layers of RIs: physical infrastructure
(collections etc.), digital data infrastructures (repositories), e-infrastructures (networks, computing
facilities), and meta-infrastructures, which aggregate independent RIs with different data formats.
On a macro level, access to data, services, expertise, and facilities is required. RIs therefore require a
multifaceted and multidimensional approach: a set of concurrent criteria for defining them in the
humanities has been drawn up, aligned along the axes of the nature of the objects, collections, and
level of data processing. This ecosystem can be applied to both global and local levels. Dariah,
1
Clarin, TextGrid, TEI, bamboo can all be counted as important digital initiatives that constitute RIs.
Multilinguality is a real challenge for this mainly English-speaking and -publishing community, the
field needs to make sure that it can reflect and accommodate these linguistic challenges. Heading
2.: Europe’s cultural and linguistic diversity is an opportunity, not a defect and should be recognized
as such. Digital RIs have been developed earlier in the sciences, where the objects of study are less
culturally bound than in the humanities. Humanities researchers tend towards qualitative
methodologies, and taxonomies and ontologies have abounded in the humanities due to these
complexities. An example is the Trier European Linguistic Network, a collaborative effort that links
important national historical dictionaries, and which is now expanding to European dictionaries.
Funding agencies play an important role in ensuring the digitization of the European cultural
heritage, particularly language resources across the board. Heading 3: digital research is still
undervalued in research evaluation programmes. The dividing lines in the traditional humanities
have been broken down in the DH, which has led to some insecurities in the evaluation of research
outputs. Engagement in the community is a prerequisite for understanding the contributions made
by the field. The same applies to the publication culture in this area, the ESF has published a report
on this problem. A culture of recognition needs to be instilled that understands the process-oriented
character of a DH project, appreciates new formats of publication, such as databases, Web sites etc.
Development of appropriate instruments for the evaluation of these outputs, a comprehensive
clearing mechanisms (peer review), and fostering of interdisciplinary tools and teams, as well as
credit and career perspectives for a new generation of young researchers are crucial. Heading 4:
Digitality, the condition of being digital, has become a focus of interest for artists who are taking an
active interest in the digital humanities, artists are creating excellent visualizations, e.g Ecke Bonk’s
installation and visualization of the Grimm Dictionary, which displays the full richness and complexity
of the work. Another example of digitality in practice is the iFormations art exhibition, originally
from British Columbia, which has been brought to Hamburg on the occasion of dh2012.
Day 2
Session 1
“Code-Generation Techniques for XML Collections Interoperability”, Stephen Ramsay and Brian
Pytlik-Zillig, University of Nebraska-Lincoln, USA
Code generation is a mode of software development that is designed to create adaptable and quick
solutions to changing requirements based on varying source documents. The key problem is text
collection interoperability. While TEI is a widely-adopted standard, it cannot guarantee
interoperability between collections. TEI does succeed quite well to allow for interchange of
encoded texts, but is far from solving interoperability issues when combining collections: complexity
and interoperability work in different directions. Even within the Text Creation Partnership corpora,
2
there are interoperability issues. Without interoperability, however, we end up with silos once again.
Perl, sed, and the bash shell are often methods to tweak things the right way. A more stable
solution to the problem is code generation, the generated target schema usually contains everything
we need to make collections interoperable. XSLT is one possible choice for code generation, as it is
almost a meta language that can also be useful for documentation purposes. The tool that has been
developed for the purpose of interoperability is called Abbot Text Interoperability Tool (on
github.com) and addresses many of the outlined issues and ensures interoperability among text
collections. The language Clojure was used for the purpose of wrapping XSLT code to make it
perform well on an HPC environment. The system scales well and works with large amounts of XML
documents.
“DiaView: Visualise Cultural Change in Diachronic Corpora”, David Beavan, UCL, UK
The presenter introduced DiaView, a tool to investigate and visualize word usage in diachronic
corpora. Starting point: the Google Books/ngram viewer is great when you know what you are
looking for, but if you don’t, it’s tricky to start out. Google Books OCR quality is poor, the corpus
does not evenly sample across genres, the chronological placements are questionable, and it is a
very large corpus. DiaView is intended to address these issues. For demonstration purposes, the
tools has been used with the English One Million corpus on Google Books, dated 1850 to the
present, which still contains over 98 billion tokens, so very infrequently used words were also
filtered out. Statistical analysis of word frequency distribution across the entire corpus is then
compared to its frequency by publication year, which highlights any distributions that are skewed or
focussed on a particular chronological range. DiaView is meant to be easy to use, can aggregate and
summarize data, promote browsing and opportunistic discovery, help discover cultural trends,
highlight interesting terms, provide links to more in-depth analysis, inspect corpus by decade, and
provide an ability to work with any corpora or any dataset. DiaView does not rely on word
frequency, it relies on calculating salience, it applies visual styles, and creates links back to the ngram
viewer for in-depth analysis of discovered phenomena. Essentially, DiaView looks for interesting
peaks in the data that help with the initial task of discovering noteworthy phenomena in the corpus.
“The Programming Historian 2: A Participatory Textbook”, Adam H. Crymble, Network in Canadian
History & Environment, Canada
The Programming Historian 2 (PH2), is an open access methodology textbook, targeted at ordinary
working humanists interested in programming. It is intended to be instantly useful for the work they
do. The first iteration of the textbook was targeted at historians alone and focussed on Python as a
programming language. It was successful, but the second version is now much more widely targeted,
new lessons were commissioned from members of the DH community. PH2 is not really targeted at
academics, it helps with visualizations and using simple but very powerful command line tools. PH2
3
employs rigorous testing and peer-review to ensure that users can use resources with confidence.
The presenter invited the audience to contribute their expertise to the textbook, authors do get
proper attribution and make a valuable contribution to digital scholarship. Ideas include: R, Ruby,
Python, tools, metadata, XML, textual and image analysis. A good lesson is structured to offer good
success and a working example by the end of a 60 minute lesson. Lessons have an obvious goal:
short time, quick success, for a novice but intelligent audience. Lessons can also be conducted in a
classroom setting, with assignments and group discussions. Online at
<http://programminghistorian.org/>.
“ XML-Print: an Ergonomic Typesetting System for Complex Text Structures”, Martin Sievers, Trier
Centre for Digital Humanities (Kompetenzzentrum), Germany
This DFG-funded project has developed its own XML-FO engine to facilitate typesetting of XML
documents that meets the requirements of professional typesetting conventions. The typesetting
system takes XML as a source document, and also works with semantically annotated data, it
provides a modern graphical interface, rule-based formatting, etc. Its output is an XSLT-FO
stylesheet that is used to generate PDFs. The tool is currently a stand-alone application, but
ultimately it will be a web service and will also be integrated in the TextGrid tool box. With the style
editor it is possible to apply layout information (or formats) to the XML source, with detailed
formatting options, similar to the options in a word processing application. The style editor
highlights the active transformation mappings, but also offers alternative formats, which can be
explored very easily. Mappings can be based on the elements in the XML tree and can be narrowed
by attributes and positions within the tree. The style editor has a modern user interface, the style
engine is based on standard XML technologies and Unicode. XML-Print addressed key requirements
facing scholars publishing their data today.
“The potential of using crowd-sourced data to re-explore the demography of Victorian Britain”,
Oliver William Duke-Williams, University of Leeds, UK
The presenter introduced the FreeCEN project, an effort to explore the potential of a set of crowdsourced data based on the returns from decennial censuses in nineteenth century Britain, starting in
1801. The census questions asked then were very different from ones asked today, the census of
1841saw the transition to a household-based census, and from 1851 there were increasingly mature
administrative structures in place. From 1891 onwards more focus was put on employment and
tracking class structures. Census information was aggregated by statisticians from enumerator’s
books kept in the households, and reports to parliament were produced from these. From 1841
onwards, the data is held at the National Archives. The FreeCEN project aims to open up this data
from publicly available sources by a volunteer effort to transcribe them from the original sources.
Much of this data is currently only available through for-fee commercial services. The sample data
4
used was data from Norfolk, which comprises 40,000 samples, 4,500 occupations, 199 different
‘relations to head of family’. Processing steps include automatic data clean-up, e.g. normalizations
(sex, age, county of birth, parish), but quality and comprehensiveness of the data will vary by area
and transcriber as data is created by an existing volunteer-based effort. The sample also looks at
lifetime migrations, which were used by Ravenstein for his 1885 book Laws of Migration. A particular
challenge has been the encoding of places places and will need more work. The interesting question
is: can we produce new interesting graphs in their historical contexts that address questions not
previously asked. Results will be published on the ESRC census programme
<http://www.census.ac.uk/> and will be made publicly available.
Session 2
“Engaging the Museum Space: Mobilising Visitor Engagement with Digital Content Creation”,
Claire Ross et al., UCL, UK
Public engagement is crucial for museums. It is a vital component of teaching, learning and research,
which are the three key tasks that museums perform. Digital public engagement is a new aspect of
this task. There has been a massive explosion of digital content in museums and mobile applications
in particular are an excellent way of involving the public digitally. Some key issues remain: how do
digital resources improve the museum experience most? This paper reports on a case study Qrator,
a collaborative project developing new content, co-curated by members of the public and
academics. The collection used is “Dead Space”, a large natural history collection. QR tags are
displayed prominently throughout the collection and visitors can contribute their own materials,
iPads are available in the museum, but users can also download the mobile app and use it on their
own smart phones and digital devices. The app is based on the Oxfam shelflife app that lets you
attach a story to your donated items. QRator is recording responses to items and also to a number
of provocative questions that were associated with the objects. Responses were categorized, they
include comments on the museum, comments on the topic/question, and noise which was not too
bad. The system is unmoderated, but filters out some profanity, the general impression has been
that the public will respond well to engaging questions and context for the objects on display. Even
one word responses to the objects are sometimes illuminating. Further investigation reveals that
the majority of the comments on topics express opinions, general comments and specific comments
on the object. Museums are opening up their collections to the public more and this is crucial for
public engagement, the digital humanities as a discipline can learn a lot from this development. In
future it will also be possible to take photos and add captions, and even short videos, to individual
objects.
“Enriching Digital Libraries Contents with SemLib Semantic Annotation System”,
Marco Grassi et al., Università Politecnica delle Marche, Italy
5
The digital evolution has made huge amounts of digital data available, and increasingly that data is
semantically enriched and published on the Web for re-use and further annotation. Annotation of
Web content is a very useful activity, what’s missing are clear semantic annotations to allow us to
unambiguously tag items, which can improve the digital library experience for users. Users should be
empowered to create knowledge graphs that rely on controlled vocabularies and ontologies, and
link out to the Web of Data for additional enrichment. Pundit is the name of the tool created for this
purpose, a novel semantic annotation tool developed as part of the EU-funded SemLib project.
Pundit relies on the Open Annotation Collaboration (OAC) ontology, which allows for wide
interoperability of annotations. Named Graphs are employed to capture the annotation content,
which makes it possible to query just slices of information using SPARQL. Annotations are collected
in so-called notebooks, which can be shared as URLs on the Web. They can also be shared on social
media sites, but individual access restrictions can be applied as well. The authentication mechanism
is based on OpenID. The hoped-for result is a huge knowledge base surrounding objects in a Digital
Library environment. Named Content is another feature of Pundit, specific markup can be added to
identify atomic pieces of information, so that the same piece of information can appear in a variety
of contexts. Pundit is a RESTful Web service, based on CORS and JSON. Pundit can annotate all sorts
of content, text, images, audio, and video, SemTube is an experimental implementation of
annotation of YouTube videos. Pundit also allows for custom vocabularies/ontologies to incorporate
content from wider areas and different domains. Pundit is available at <thepund.it>.
“Digital Humanities in the Classroom: Introducing a New Editing Platform for Source Documents in
Classics”, Marie-Claire Beaulieu, Tufts University, USA
The presenter introduced a new online teaching and research platform that enables students to
collaboratively transcribe, edit, and translate Latin manuscripts and Greek inscriptions. The platform
needs to allow for the editing and study of large amounts of source documents and make
documents widely available. The pedagogical needs include hands-on learning, inclusion of noncanonical texts, a collaborative learning environment, and ability to form part of a student’s online
portfolio (undergraduate and postgraduate students). The project was piloted in 2012, editing and
translating the Tisch Miscellany, but the workflow with Word documents and PDFs was
unsatisfactory. The new system has been tested with a set of uncatalogued medieval MSS and early
printed books. Other sources can be used from Creative Commons sources. A special concern has
been the visual representation: these texts need to be approached as physical objects as well as
texts. Platform adopted the Son of Suda Online (SoSOL) software for editing, and CITE services
(‘Collections, Indexes, and Texts, with Extensions’) developed by the Homer Multitext Project for
image citation. SoSOL was developed by the papyri.info team, it supports collaborative editing, and
the TEI-based EpiDoc XML standard. CITE services link the resources in the platform and offer URNs
to identify the citation as well as the association between the image to be transcribed and the XML
transcription. All texts are archived in the Perseus Digital Library and the Tufts institutional
repository. The inscriptions will be available on Perseus as a new collection. A version history is also
6
kept. Limitations include the plugin technology used, but this will be addressed by a move to jQuery
in the future. Integration of SoSOL and CITE has been technically challenging and interoperability
hasn’t been easy. In conclusion, the project demonstrated an interesting integration of teaching and
research, an introduction of DH methodologies into the classroom, creation of publicly available
tools, and leveraged expertise from a range of Tufts university units. It has been an interesting
virtual lab setting experiment.
Session 3
“Myopia: A Visualization Tool in Support of Close Reading”, Laura Mandell, Texas A&M University,
USA
This paper describes a collaboration between an interdisciplinary group of researchers. Myopia is a
close-reading tool to analyse poetic structures and diction. The texts that have been used as a basis
are from the Poetess Archive. There have been visualizations of the Poetess Archive content before,
the aim of this tool is to benefit from the time-consuming activity of close-reading by feeding the
resulting markup into a tool that seeks to amplify our understanding and facilitates discovery of new
knowledge. Myopia is a desktop tool built in Python, that visualizes poetic structures, analyses
metre and tropographical encoding. On mouseover, the software identifies each metrical foot and
displays its characteristics as part of a line of verse or stanza. This has only been done in detail for
one poem: Keats' “Ode on a Grecian Urn”: metre encoding, trope encoding, sound encoding, and
syntax encoding have been produced as TEI P5-conformant stand-off markup. The tool allows for a
layered display of these various textual features, and also for the text to be hidden, which is useful
for comparing poems. Also stresses are visualized through flashing text. Originally the encodings
were all in one TEI document, but this turned out to be too complex, they are now separated out
into four XML documents. Overlapping hierarchies are a major obstacle in many of these XML
encodings. Close reading can be a very satisfying endeavour in its restraint that allows for narrowing
down meaning to significance (as Stanley Fish demanded in his criticism of the digital humanities),
but the interesting part is that authorial intentionality is unknowable, and sometimes is “made” by
critics and readers. Finding the critical means to determine this intentionality is where computerassisted close reading needs to go. Discovering patters is clearly a more important task than
focussing too narrowly on features that are pronounced unique, but are unconvincing on closer
inspection.
“Patchworks and Field-Boundaries: Visualizing the History of English”, Marc Alexander, University
of Glasgow, UK
This paper uses the database of the Historical Thesaurus of English developed over 44 years and
published in 2009 to visualize change in the history of English, and in particular in the English lexicon.
7
The corpus exists in both print and as a database at the University of Glasgow. The database is a
massive computational resource for analyzing the recorded words of English with regards to both
their meaning and dates of use. By combining visualization techniques with the high-quality
humanities data provided in the HT, it is possible to give scholars a long-range view of change in the
history, culture and experiences of the English-speaking peoples as seen through their language. A
tree-mapping algorithm was used to visualize the database. Colour has been used to highlight OE
(dark), and newer uses of a word (light yellows). The visualization of the coloured map of English
words shows that there are patterns to be discovered inside of particular linguistic areas and key
stages, such as in Old English, Middle English, and Modern English. This approach can also be
extended to mappings e.g. of metaphors and how they develop over a period of time or certain
category growth over a period of time. Categories such as electromagnetism and money,
newspapers, geology, chemistry grow in the 18th century, but others such as faith, moral, courage
decline. Salience measures in a separate corpus of Thomas Jefferson’s works against a reference
corpus shows surprising examples, categories such as conduct, greatness, order, and leisure are
above average in his writings, but lots of things one would expect are missing such as farming and
agriculture. It will be interesting to see if the database will allow for similar investigations into
entymology. It is hoped that the data will be licensed by OED to allow free use for academic
purposes.
“The Differentiation of Genres in Eighteenth- and Nineteenth-Century English Literature”, Ted
Underwood, University of Illinois, Urbana-Champaign, USA
This is a collaborative project from the University of Illinois. The general strategy is corpus
comparison, a very simple approach, but it is hard to interpret phenomena, and inherent significance
is not easily identifiable. It is easier to interpret comparisons that are spread out over a time axis. It
removes the argument of an “ahistorical” significance and lets us focus on specific phenomena in
specific periods that can be more easily tried and tested. Two categories, prose and poetry, drama
and fiction have initially been compared, e.g. the yearly ratio of words that entered the English
language between 1150 and 1699 in prose and poetry. We see that the curve is steadily going up in
poetry, but is flattening in prose. The text corpus of 4,275 volumes was drawn from ECCO-TCP, Black
Women Writers, and Internet Archive. OCR was corrected, notes were stripped, focus has been on
the top 10,000 words in the collection, stopwords have been excluded. By narrowing the analysis,
we realize that the poetry corpus ranks highest by far in this comparison, followed by prose fiction,
while the curve for non-fiction prose flattens dramatically. Drama follows the same pattern, it is
much higher than non-fiction, on a similar level than prose fiction. We need to ask ourselves what
we know about the genres involved? Genre is not as easily distinguishable in 1700 as it is now, we
see that the differentiation of diction is running in parallel with the distinguishing of literary genres.
Also, we need to ask what do we know about the entymological metric? E.g. use of Latin-origin
words in the period between 1066 and 1250 is less than OE words, as English is still primarily a
spoken language, later a more learned vocabulary develops for written English. In our interpretation
we need to be careful not to overestimate the metric we chose for this analysis, we cannot really
8
know if this is the right metric. We could try and mine other associations from the trend itself to gain
further insights. Observing correlations of words over time is a good means of statistical exploration
of a corpus, in poetry for example, we can see correlations of personal experiences, domestic,
subjective, physical and natural. It also reveals a broad transformation of diction, which is not
necessarily as easily verifiable and this analysis offers a first significant means to exploring these
trends.
Day 3
Session 1
“Literary Wikis: Crowd-sourcing the Analysis and Annotation of Pynchon, Eco and Others”, Erik
Ketzan, Institut für Deutsche Sprache, Germany
This presentation reported on the use of Wikis for the purpose of annotating complex literary texts
collaboratively. The frequent unattributed references in Umberto Eco’s Mysterious Flame of Queen
Loana were the starting point for the annotation project, particularly to annotate all the literary and
cultural references to 1940s and 1950s Italy in the novel. This project gathered some 30-40
annotators who produced more than 400 entries. A second example is Thomas Pynchon's Against
the Day (2006) which prompted the creation of <pynchonwiki.com>. It allowed for A-Z as well as
page by page entries, had more than 400 contributors who produced an annotation corpus of over a
million words. Speed was the main advantage of the Wiki over the book, within weeks major long
novels were thoroughly annotated and referenced. As in Wikipedia, a hard core of users do most of
the work. It was important to establish the expected quality of the entries at the start was, people
then produced entries of a similar quality. Thus, community guidance while minimal proved crucial.
Placing question marks behind entries was a great trick to solicit annotations and prompt people to
contribute. Contributors take things personally, they are emotionally engaged and need to be
treated appropriately. People liked the page-by-page annotations, as there is a sense of noticeable
progress and a prospect of completion. E-book readers now do some of what literary wikis are
about, and literary wikis are a great way to remove the sometimes artificial divide between
academia and the public. Wikis are, however, not easily extensible to all authors, they work best
with authors that already have a vibrant online community. No community really evolved around the
Wikis, people came, edited and left.
“Social Network Analysis and Visualization in The Papers of Thomas Jefferson”, Lauren Frederica
Klein, Georgia Institute of Technology, USA
9
The source materials of this project are The Papers of Thomas Jefferson, digital edition (Virginia). The
project applies social network analysis to Jefferson’s paper in order to visualize the resources of a
historical archive and to illuminate the relationships among people mentioned in the resource. The
project is part of a larger project to apply natural language processing and topic modelling to the
works of Thomas Jefferson. The sources are full of references to people, particularly in Jefferson's
letters. Often these references are obscured by abbreviations, by the use of first names only, by
terms of endearment, etc. In tracing these references, the project is hoping to overcome the
phenomenon of archival silence for many of the persons referenced in these writings, particularly
regarding enslaved men and women. Arc diagrams can be produced to visualize the relationships of
people in the correspondence, these can then be grouped into letters to family, political
correspondents, friends in Virginia, enslaved staff, international correspondents etc. These
visualizations let us view the monolithic corpus of Jefferson’s letters with new eyes, it highlights the
main components of the correspondence, but also the omissions and silences. Digital Humanities
techniques can really help us to address some of the issues associated with the discovery of
connections, which are otherwise difficult to trace and establish.
“Texts in Motion - Rethinking Reader Annotations in Online Literary Texts”, Kurt E. Fendt and Ayse
Gursay, MIT, USA
This project is a collaboration within the multidisciplinary HyperStudio unit at MIT
<hyperstudio.mit.edu>, a research and teaching unit on digital humanities and comparative media
studies. The key principles of the unit are based on the educational and research needs, and usually
involve co-design with faculty, students, etc., in an agile development process with integrated
feedback loop. Students in the unit are considered novice scholars, engagement of learners in the
teaching and research processes is crucial. Need for this project arose out of the question, how can
we make literary texts much more accessible, thus supporting students in understanding original
sources with a focus on the process of analysing literary texts. The resulting product that addresses
this need is Annotation Studio, a flexible graphical annotation and visualization tool. It is based on
reader-response theory, identifying different types of readers, and considering readers as
collaborators, meaning as a process, not a product. The tool is trying to make the private process of
interaction with a text visible. Engaging students to become editors, and hence “writerly readers” is
key to the process. Annotation is a powerful mechanism for engaging with texts as writers and
readers, focussing the engagement on a close-reading level. The technologies used keep text and
annotations separate and support flexible annotation formats. The tool is built on open source
technologies, and implemented in JavaScript and Ruby on Rails. It will be open sourced at
<hyperstudio.github.com>. Annotation Studio was funded by an NEH digital humanities start-up
grant.
10
“Developing Transcultural Competence in the Study of World Literatures: Golden Age Literature
Glossary Online (GALGO)”, Nuria Alonso Garcia and Alison Caplan, Providence College, USA
The focus of the presenters' engagement with the digital humanities is its pedagogical implications,
particularly in foreign language classrooms. The main aim is to increase engagement of students
with foreign language texts and remove some of the obstacles to this engagement. The presenters
introduced an online searchable glossary, GALGO, of key words from the literature of the Spanish
siglo d’oro. Based on keyword theory, the process involves defining the keyword, tracing the
keyword in context (identifying keywords in close proximity), and clustering the keyword (group of
multiple keywords in categories focused on culture and society). Digital classroom applications need
to be very focussed to achieve a real benefit in a teaching situation. GALGO is work-in-progress, its
pedagogical potential will evolve and be improved over time. The glossary offers contextualized
definitions as well as entries for all the meanings of a word context-indepentently. All texts will be
hyperlinked to entries, so users can look up words in the original contexts from which the definition
has been taken. The glossary therefore has an intra- as well as intertextual dimension, learners
engage with classic texts and test their hypotheses regarding the polysemic value of Golden Age
concepts. Learners engage with meanings in a contextual manner, which is more helpful than just
basic word definitions. The tool has been conceived as a supplement to the texts in print used in the
classroom, it is intended as a reference tool to support students in the process of paper writing and
similar tasks in a classroom context.
Session 2
“Aiding the Interpretation of Ancient Documents”, Henriette Roued-Cunliffe, Centre for the Study
of Ancient Documents, University of Oxford, UK
Supported by the eSAD project and an AHRC-funded doctoral studentship, the presenter
demonstrated how Decision Support Systems (DSS) can aid the interpretation of ancient documents.
The research conducted centres on Greek and Latin documents, particularly the Vindolanda tablets,
and is intended to support papyrologists, epigraphers, and palaeographers, and will potentially be
useful to readers of old texts from other cultures. Decisions in the humanities are often
interpretations, frequently subjective, often not well supported or quantifiable, and difficult to map
as a structure. Computers can aid in this process, but are not making decisions. This is where DSS
comes in: it supports, but does not replace decision-makers. Formalized decisions are possible in
expert systems, but this is not what we are aiming for. DSS are situated somewhere in the middle
between expert systems and individual readings. DSS allows for the recording of initial findings and
provides an opportunity to annotate those readings and preliminary insights. The APPELLO word
search web service was developed to search through XML-encoded texts, to match a pattern of
characters and returns possible matches to that pattern. The DSS prototype is mainly a proof-ofconcept. A case study was used for this purpose: DSS offers a structure that enables scholars to
11
remember their decisions and thus aids them in their further investigation. The DSS application was
built around the tasks that readers spend most time on, such as identifying characters, looking for
word patterns, and contextual information on passages. The DSS lets scholars record their
arguments rather than force them to make decisions. DSS can aid the reading of ancient documents
by helping to record the complex reasoning behind each interpretation. A future DSS will most likely
be layered, where individual layers can be turned on and off, e.g. original image, enhanced image,
transcription, structure, meaning, interpretation. Developing the DSS as a component of a larger
collaborative VRE would be an ideal approach.
“Reasoning about Genesis or The Mechanical Philologist”, Moritz Wissenbach et al., University of
Würzburg, Germany
This presentation examines the decision-making processes involved in establishing the genesis of a
literary work, the result of which is a genetic edition. The text in question is Goethe’s Faust. There is
only one critical edition of the work in existence, which dates from 1887. A new edition is currently
being prepared by a group of scholars. There are c. 500 archival units/MSS extant, but only very few
of them are dated, dating these witnesses is a major task. It is a key task in establishing links and
relations between the MSS, as is taking into consideration a century of published Faust scholarship.
Dating MSS requires some methodological reflection, if no absolute chronology can be established,
we must rely on a relative chonology. Evidence for dating can be explicit dates, material properties
(paper, ink), external cues (mentions), and “logic”/dynamic of genesis inherent to the text.
Computer-aided dating is a tool for editors that can provide the same tool and data to the users to
follow the argument of the editor. The main task is to formalize the practice of dating using formal
logic. Central notions/rules for dating are: “syntagmatic precedence”, where we hypothesize about
the precedence of one version of a text over another, “paradigmatic containment”, where if a text is
contained in another text, we assume that the contained text is earlier than the text containing the
text, and “exclusive containment”, which says that if a text is exclusively contained by another, than
the text containing the text is earlier then the included text. To these rule sets, we need to add the
knowledge from research that has already been done. New research will also be added and then all
these resulting graphs of relations can be combined, any inferred relations can be overridden by
actually established ones. Inference rules are formalized hypotheses about an author's writing
habits, the measures for these are: coverage, recall, and accuracy. Goethe's writing habits are far
from linear, a formal logic approach for the purpose of rule prioritization seems to offer a promising
qualitative analytical framework.
“On the dual nature of written texts and its implications for the encoding of genetic manuscripts”,
Gerrit Brüning, Freies Deutsches Hochstift, Germany, and Katrin Henzel, Klassik Stiftung Weimar,
Germany
12
Textuality is at the heart of many of the questions surrounding genetic encoding. The notion of
written text is at the centre of these deliberations. The TEI was one of the earliest initiatives to
capture textual features, but only marginally covered the genetic processes, these have only recently
been incorporated in the latest version of the TEI P5 Guidelines. But these new elements conflict
with the old text-oriented markup, a clarification of these relations needs to be reached to marry
these different layers of textual encoding. One way of approaching this is to focus on the concept of
“written text”, the materialized version of a text that is the result of the writing process. Written text
has to be considered as a linguistic object that is not easily separated from its materiality. The
physical object is an inscription on a material surface. The two dimensions of materiality and
textuality are not easily integrated and each must be given their proper place. Documentary vs
textual encoding is the main issue, we need what is now in chapter 20 of the TEI, i.e. nonhierarchical markup. But genetic markup goes further than that, it appears that some of the
clarifications need to be based on a revisiting of the basic notion of textuality and “written text”.
Intermingling of a documentary and a textual perspective complicates both the recording of
information on the writing process and the subsequent processing to generate e.g. diplomatic and
reading text versions. For the purposes of genetic encoding, we need to differentiate between text
positions and textual items, which might help to spot, clarify, and come up with a solution to these
common conflicts.
Session 3
“Violence and the Digital Humanities Text as Pharmakon”, Adam James Bradley, The University of
Waterloo, Canada
This paper theorizes on the process of visualization as both a destructive and creative task,
destructive as it replaces the studied object, creative as it shifts the aesthetic of the text for reinterpretation. The presenter has experimented with visualizations of eighteenth-century thought,
particularly Diderot’s philosophy, which is based on the concept of nature: all matter acts and reacts, when relationships are created, the notions of structure, form and function are also created.
The accuracy of these suppositions is to be confirmed experimentally. In Diderot, enthusiasm is key
to the process of perception that results in inspiration. The process of visualization goes through
three steps according to Diderot: a pure visualization that is enticing inspiration, a defamiliarization
of the supposedly known, and a creation process evolving from this. The artist in this way helps to
abstract from the text to a three-dimensional model that allows for an exploration along different
paths. Using a mathematical approach, the presenter introduced a new way of visualization that
maps every word of a text into a three-dimensional space. The three-dimensional space is created by
a base 26 number line that, according to mathematician Cantor, has an exact representation in a
three-dimensional space. This location information means that any text can be rendered threedimensionally, very large corpora can thus be visualized and explored in new and interesting ways.
This type of visualization is non-destructive as it creates a one-to-one relationship between the
13
visualization and the original text, which allows for an easier transition between the original text and
the visualization.
“Recovering the Recovered Text: Diversity, Canon Building, and Digital Studies”, Amy Earhart,
Texas A&M University, USA
Diversity of the literary canon in America has been a long-standing issue in literary criticism.
Digitization and the Web was hailed as a democratizing medium that would widen the canon and
incorporate women writing to a far higher degree than was previously possible. A lot of early
unfunded simple projects sprang up in the late nineties that have produced a wide range of
“recovery web pages” that highlighted recovered literature by women and minorities. Many of these
projects are already disappearing and this poses a real issue as the resources they presented are not
easily retrievable or easily accessible. Many of these projects have been conducted by single
researchers outside of a digital infrastructure and outside the digital humanities community. The
disappearance of the projects is also due to the funding infrastructures that are not conducive to
small-scale lesser known writers and texts, there is little impact factor for such endeavours. We need
some basic preservation infrastructures to preserve these projects. Incorporation of such smaller
projects into larger currently funded infrastructures would be a good possibility for preservation.
Existing structures of support need to be supplemented by a social movement to raise awareness
about these projects and exploit their resources to maintain a wider canon in the digital humanities
world than we have today.
“Code sprints and Infrastructure”, Doug Reside, New York Public Library, USA
Small projects have sometimes achieved in a small way what large infrastructure projects have not
yet achieved, namely to establish communities that develop commonly needed tools collaboratively.
Methods such as rapid prototyping, gathering “scholar-programmers” and hackathons are designed
to foster such communities. These working meetings, or “code sprints”, are designed to address
concrete requirements of a particular community or group od scholars. But code sprints also have
problems, such as a lack of code sharing mechanisms, coding languages and dialects, lack of focus
and documentation, and differing assumptions about goals. Big infrastructure programmes follow a
waterfall development (Plan|Do): requirements, design, implementation, verification, maintenance.
Agile development is circular, by contrast, iterative and quick. Interedition is an example of a
successful small product that was driven very much by a community that evolved around it.
Organized in the form of boot camps, these meetings focused on transcription, annotation, and
collation as primary scholarly tasks in producing (digital) scholarly editions that could effectively be
supported by common models and tools. The Interedition boot camps have resulted in various new
tools in the form of web services – of which CollateX is probably is best known – and considerable
progress in development of existing tools (Juxta, eLaborate etc.). Small agile developments could be
14
very useful in the digital humanities as a loosely organized community that is characterized by
flexibility and enthusiasm. Microservices seem to be a good means of development in this
community that can be developed from the bottom up and involve contributors from a variety of
backgrounds, e.g. humanistic, technical, design/visualization experts. It is possible that large
infrastructure projects such as Bamboo, Dariah, and TextGrid could provide the organizational and
administrative infrastructure needed to make code sprints more effective and their work more
sustainable.
Session 4
“Developing the spatial humanities: Geo-spatial technologies as a platform for cross-disciplinary
scholarship”, David Bodenhamer, The Polis Center at IUPUI, USA, et al.
Key themes of this panel were an effort to reconcile the epistemological frameworks of the
humanities and GIS to locate common ground for cooperation, designing and framing spatial
narratives about individual and collective human experience, building increasingly more complex
maps of the visible and invisible aspects of place. What we are aiming for are “Deep Maps”, a
dynamic virtual environment for analysing and experiencing the reciprocal influence of real and
conceptual space on human culture and society: spatially and temporarily scaled; semantically and
visually rich; open-ended, but not wide open; problem-focused; curated and integrative; immersive,
contingent, discoverable; support of spatial narratives and spatial arguments through reflexive
pathways; subversive.
“GIS and spatial history: Railways in space and time, 1850-1930”, Robert Schwartz, Mount Holyoke
College, USA
GIS can contribute to the spatial humanities as demonstrated by this work on the expansion of
railways in France in the 19th century. GIS is great at developing spatial and temporal patterns that
can be queried inside of a specific cultural context and background. Contributions of GIS to
humanities include: large scale comparisons across national borders, tracking multi-scalar change,
combining multiple research domains, multiple kinds of sources and evidence, and spatio-temporal
patterns. Its limitations include that it is a reductionist technology if left uncontextualized, thus
narrowing your research questions, and has a tendency to facilitate simplicity and restrict
complexity. In the context of the railways project, questions to be asked are not so much how far
cities were apart, but what the human experience was to travel those distances, what the journey
was like, whether it was common or unusual? The expansion of the railway system during the 19th
century meant a transformation of both spatial and temporal proximity not only between places but
also from people’s homes to the nearest railway station. What did this accessibility mean and why
does it matter? From a humanistic point of view the changes in people’s perceptions about space
and time are more important than all the socio-economic implications the expansion of the railway
system brought about. GIS identifies patterns of spatial and temporal interconnectivity, provides
15
geo-historical contexts for teasing out meanings and explanations. GIS is most useful as an auxiliary
methodology in support of humanistic enquiry.
“Spatial Humanities: Texts, GIS, Places”, Ian Gregory, Lancaster University, UK
GIS is an integrating technology that can accommodate different methodological approaches from a
number of disciplines. GIS has been used to trace infant mortality rates in Great Britain in the 19th
century and map the areas that had the biggest decline in infant mortality and those areas which
had the least progress in reducing infant mortality rates. This does not offer us any explanation as to
what the reasons for this are, but this is not a problem of GIS itself but of the source data available
for the purpose. Instead we need to look at the literature of the time to find out what the reasons
for certain phenomena are. Corpus linguistic techniques can help us out with collocation techniques
to find associations between certain places and topics of investigation during certain periods.
Mapping the Lakes is another example that will be integrated in this research, a comparison of the
tours of the Lakes done by Gray and Coleridge in 1769 and 1802 respectively. The ERC-funded
Spatial Humanities project wants to bridge the quantitative/qualitative divide, it also wants to build
up the skills base, establishing a PhD studentship etc. GIS provides us with a new way of looking at
texts, identify patterns and offers relations between these observations and patterns discovered.
“Mapping the City in Film: a geo-historical analysis”, Julia Hallam, University of Liverpool, UK
The year 1897 was the beginning of film-making in Liverpool and a previous project traced the
development of the city in film. This work involved the use of maps from an early stage, particularly
as a means of comparing urban developments in various parts of the city. This has led to the
“Mapping the City in Film” project that aims to develop a GIS-based resource for archival research,
critical spatial analysis, and interactive public engagement. The aim is to map the relationship
between space in film and urban geographic space. GIS allows us to navigate the spatial histories
attached to various landscapes in film, develop dialogic and interactive forms of spatio-cultural
engagement with the local film heritage, facilitates public engagement in psycho-geographic
narratives of memory and identity around film, space and place. The result of the project is an
installation in the Museum of Liverpool of this integrated map. It offers anchors for attaching films to
it and recording experiences of space in film. GIS has several advantages: layering the cine-spatial
frame-by-frame data, informing multi-disciplinary perspectives and practices that contextualize
historical geographies of film by mapping a virtual multi-layered space of representation: filmic,
architectural, socio-economic, cultural, etc.
16
Day 4
Session 1
“Intertextuality and Influence in the Age of Enlightenment: Sequence Alignment Applications for
Humanities Research”, Glenn H. Roe, University of Oxford, UK
The presenter reported on the findings of a project conducted in the contexts of the ARTFL project,
Chicago, and the Oxford E-Research Centre. Textual influences and intertextuality are important
areas of literary study. Relations between texts are complicated and multi-faceted, it is a key
element of humanisitic endeavour to trace these links, from direct quotes to “influences” and
allusions. To examine the genetics of intertextuality, we are borrowing an idea from microbiology,
namely sequence alignment: a general technique to identify regions of similiarity shared by two
sequences, aka longest common substring problem. There are applications of this methodology in
many domains. Key advantages of the approach are that it respects text order, does not require preidentified segments, can align similarity directly, not as block of texts, spans variations in similar
passages (insertions, deletions, spelling, OCR and other errors, other variations). The result of this
work is a software package called PhiloLine/PAIR, now an open-source module with the Philologic
software. The software is based on n-grams, currently tri- and quad-grams, filtering is used as well as
stemming. Parameters that can be set include span and gap, i.e. the minimum number of n-grams
considered a match, and the maximum number of unmatched n-grams allowed within a match
respectively. Results are stored in individual files sorted chronologically by year of document
creation. Each link contains bibliographic information. The planned output is XML, possibly TEI
reference indicators. As a use case Voltaire's Encyclopedie has been analysed for Voltaire’s presence
in it. This has long been a problematic area, there are lots of quotations and references, but most of
them unattributed, Voltaire has been described as “absently present” in the Encyclopedie. This has
led critics to believe Voltaire was sceptical of the endeavour and the agenda of its editor,
contributing only 45 articles. However, the tool used here has discovered well over 10,000 matching
sequences between Voltaire's Complete Works and the Encyclopedie, thus demonstrating Voltaire's
textual presence as an authority over and against his more restrained role as an encyclopedic
author. Intertextuality thus avoids the problems of distant non-reading, lets you focus on relevant
passages, identify passages that allow for the fruitful engagement of academics with the influences
and intertextualities of a period.
“Trees of Texts - Models and methods for an updated theory of medieval text stemmatology”, Tara
Lee Andrews, Katholieke Universiteit Leuven, Belgium
Stemmatology is the discipline of deducing the copying order of texts on the basis of relations
between the surviving manuscript witnesses. There are different approaches, the most popular of
which is the “method of Lachmann”, a 19th-century methodology, that has some significant
17
drawbacks, which resulted in a new approach termed Neo-Lachmann. The latest approach employs
phylogenetic methods, essentially sequencing of variations, but the issues of significance of any
observed phenomena remain a major problem. Too much a-priori judgement, or no judgement at all
are the two extremes. We need an empirical model for text variation. The goal of this research is to
arrive at an empirical model for medieval text transmission, a formalized means of deducing copying
relationships between texts. To this end we must evaluate a stemma hypothesis based on the
variations in a text tradition, and we must evaluate variants in a tradition according to a given
stemma hypothesis. When we model text variation programmatically, we end up with sequence
graphs for the various witnesses of a text, which we can then use to map relationships. We also need
to model witness relationships in the form of stemma graphs, but handling witness corrections is
complex as soon as several witnesses are involved, as all possible relations of unclear/unverified
corrections need to be encoded. Different types of variants in texts such as coincidental and
reverted variants all need to be modelled in stemma graphs. The project has developed a tool that
evaluates sets of variants against a stemma hypothesis of arbitrary complexity, testing whether any
particular set of variants aligns with the stemma hypothesis, and measuring the relative stability of
particular readings. This method offers a means of introducing statistical probability by taking all the
available evidence of a text into account, thus removing the limitations of using scholarly instinct
alone. The Stemmatology software is available for re-use <https://github.com/tla/stemmatology>.
“Contextual factors in literary quality judgments: A quantitative analysis of an online writing
community”, Peter Boot, Huygens Institute of Netherlands History, The Netherlands
Online writing communites, communities submitting and discussing their stories, offer an interesting
datasets for digital humanities investigation. This paper focusses on , a Dutch-language online
writing community unfortunately no longer active. The different types of texts and comments on the
texts are at the centre of the investigation. The available data (acquired by a Web crawl) consists of
60,000 texts, 2,435 authors, 350,000 comments, 450,000 responses, and 150,000 ratings. It is not
always clear that works weren’t changed, works could have been removed, some communication is
missing, and people can always have multiple accounts. Focus of research has been on the
comments: number of words, word categories, relations. The average comment length was 44
words, 4,400 comments had more than 300 words, sometimes more than 2000 words. Most people
who wrote stories also wrote comments, some people only commented. There are seven major
categories in comments: greetings, praise, critical vocabulary, negative comments, site-related
vocabulary, names of members, and emoticons. Members of the writing community can then be
characterized according to their behaviour with regard to the categories. Analysis of commenterauthor pairs is interesting, the relationship depends on the networking activity of the author and
their agreement on style and interests. Writing communities have long been ignored in the
humanities, audience studies is an interesting field that could be fruitfully explored in the digital
humanities, it is currently left to psychologists, sociologists, and marketeers. Research potentials
include analysis of impact of reading and reader response, growth in canon, style in relation to
18
appreciation, stylistics (development, imitation), discussing style as a function of text genre, critical
influences from published criticism, dynamics of reading and writing groups. Research limitations,
despite of plentiful data, include: no controlled experimental data, completeness, data
manipulation, ethical and rights issues, challenge of text analysis tools, and data is no substitute for
theory. Beyond online writing communities, it would be interesting to investigate online book
discussions, and other online communities around other forms of art.
Session 2
“A flexible model for the collaborative annotation of digitized literary works”, Cesar Ruiz and
Amelia Sanz-Cabrerizo, Universidad Complutense de Madrid, Spain
The presenters introduced @note, a collaborative annotation tool for literary texts. The tool is a
collaborative effort between humanists and computer scientists. The context for this project is the
university's collaboration with the Google digitization effort in 2000. The resulting texts aren’t useful
to researchers as they are, as reliable texts are needed with a full set of bibliographic information,
corrected OCR, and access tools that provide easier access to digitized texts. Without these
enhancements we risk that Spain's cultural heritage is ignored in the Google universe and by the
Google generation. The project adopted a bottom-up approach, focussing on user needs. The basic
principle is to use the digitized image as the starting point and to make it annotable. The project ran
as part of the Google digital humanities start-up grant programme. @note promotes the
collaborative annotation of text by both teachers and students: a flexible and adaptable model that
allows for varying interests and varying levels of expertise. The @note annotation model is usercentric, roles include that of annotator and annotation manager. The @note system has been
implemented as HTML5. The administration area has a number of admin functions, the annotation
manager can create annotation taxonomies, groups, tags, etc. Books are retrieved intuitively via a
search function, annotations are then listed alongside the work with anchors on the image that is
annotated on mouseover. There can also be comments on annotations, which will appear indented.
Annotations can also be filtered logically according to rule sets that can be defined. The project is
still work-in-progress: integration of annotations into e-learning environment, and moving academic
annotations into a research support environment are high on a list of desirable additions. Retrieval
of texts from other libraries is being investigated. Also, a suggestion system for concepts and tags is
in development. Students’ e-writing skills are an important area of development and a project like
this can aid in this endeavour.
“Digital editions with eLaborate: from practice to theory”, Karina van Dalen-Oskam, Huygens
Institute for the History of the Netherlands – Royal Dutch Academy of Arts and Sciences, The
Netherlands
19
The subtitle of the presentation, “From practice to theory”, is intentional, users influence the
definition of the underlying principles of the eLaborate tool. The software has been developed by
Huygens ING, a research organization with a strong focus on textual scholarship. The institute offers
consultancy and produces scholarly editions. Most texts chosen for the editing platform relate to
Huygens ING’s own research topics, text analysis of digital textual studies is high on the research
agenda. The aim is to produce a corpus of high-quality electronic texts, the starting point are basic
marked-up texts, which can be enhanced at any point, crowd-sourcing is used for transcription
projects. Editions are often still produced in Word, the transformation to online editions is lengthy,
but the use of a common platform is making this task easier. Features can then be added to the
platform as a whole which will filter through to all editions in the same environment. Also, text
analysis tools are being developed as part of this process. The system is called eLaborate, a flexible
online environment which offers an editorial workspace for editors. Side-by-side display of
manuscript image, transcriptions, translation, annotations, and metadata is available. The system
has been tested with various groups of users, including volunteers with an academic background and
university students in textual scholarship courses. Users can hide annotations by other users or the
editor, but the editor cannot, he needs to make a scholarly decision what to do with these
annotations, how to categorize and use them for a particular edition. Diplomatic and critical
transcriptions are currently produced for each edition, in an ideal case both would be generated
from the same source. There is a need for several layers and types of annotation, including
personalized, authorized, and editable annotations. The editor’s role changes in this environment,,
he becomes a moderator and teacher, and tasks can be divided among the crowd based on
expertise. The eLaborate software is available to external projects as long as there is a shared
interest in the texts. eLaborate is not (yet) intended to be open-sourced for a variety of technical and
institutional reasons.
“Wiki Technologies for Semantic Publication of Old Russian Charters”, Aleksey Varfolomeyev,
Petrozavodsk State University, Russian Federation, and Aleksandrs Ivanovs, Daugavpils University,
Latvia
This project is investigating the application of semantic publication principles to historical
documents. Semantic description levels include the palaeographic level, the internal structure, and
semantically interconnected documentary evidence. Semantic publications which are provided with
additional information layers represent knowledge about documents in a formalized way. Based on
semantic Web technologies (semantic networks, triples, and ontologies), semantically enhanced
historical records can form the basis for a VRE, and can represent both the physical form and
authentic texts (diplomatic transcription) of historical records, as well as their translations into
modern languages. Semantic MediaWiki has been developed and adapted for this purpose. The
Wiki markup language works well with semantic annotation as it allows for the named annotation of
textual features. The text corpus is a corpus of Old Russian charters from the late 12th to the early
17th centuries from the Latvian National Archives and the Latvian State Historical Archives.
20
<Histdocs.referata.com> is the home for these historical documents. MediaWiki extensions such as
XML2WIKI and WebFonts are helping to overcome some of the issues encountered during the
process, particularly working with encoded full-text and Unicode. The pages in the Semantic Media
Wiki can also be used to visualize firstly the semantic network of relations between the charters’
texts and secondly links to external historical records and research publications. Within the semantic
network different facts about the charters, which are recorded with Semantic MediaWiki tools, can
be automatically transformed into RDF triplets. Therefore, Wiki-systems can be used for the
production of semantic publications of charters and other written documents.
Session 3
“Connecting European Women Writers. The Selma Lagerlöf Archive and Women Writers
Database”, Jenny Bergenmar and Leif-Jöran Olsson, University of Gothenburg, Sweden
The Selma Lagerlöf Archive (SLA) is an attempt to make the collected writings of this important
Swedish author available, including her literary and epistolary writing. The SLA makes the Lagerlöf
collection at the National Library of Sweden available to a wider audience and aims to create a wider
cultural and historical context for these documents. Aims include to create a digital scholarly edition
of her works, to digitize her collections at the National Library of Sweden as well as contextual
materials in other archives, to establish a bibliographical and research database. A first demo
version of the archive is currently under development. Linking the SLA to the Women Writers
Database <www.databasewomenwriters.nl> is a means of achieving some of these aims, the links
between Lagerlöf and the wider context of women writers in the late 19th century is only partially
implemented, and such integration of resources into a European Women Writers database could be
beneficial in tracing and visualizing these links. To achieve interoperability and data exchange
between the archive and the database, a meeting was held to discuss a suitable data model. A
minimal set of entities was agreed on and the model now facilitates relations betweens people,
works, and people and works, and also provides information on holding institutions. It is now
possible to share XML records between the project and the WomenWriters database via a single API.
Underlying the system is eXist-db with Apache Tika for binary file format data extraction and
indexing. A set of microservices have been created that can query and extract, manipulate and reuse the data in the XML database. There is also the option to publish resources in a machineprocessable way. Collation of texts and variants can be recorded relatively easily with tools like Juxta
and Collate. Also reception histories and discourses surrounding individual texts would be a fruitful
area of enquiry, similarly translations of the works into other languages. For small projects,
collaboration with larger projects is extremely helpful as is the use of existing tools and platforms for
the standards employed.
21
“Modeling Gender: The ‘Rise and Rise’ of the Australian Woman Novelist”, Katherine Bode,
Australian National University, Australia
This paper uses quantitative and computational methods to trace gender trends in contemporary
Australian novels, drawing on the resources of the government-funded AustLit database. AustLit is
the most comprehensive online bibliography of Australian literature, it contains details for over
700,000 works and secondary literature. As AustLit is based on standard bibliographical standards, it
provides an excellent source for computational analysis. Methodologically related to Moretti's
distant reading paradigm, the approach used here avoids the methodological void of Moretti's
approach. While AustLit is an impressive resource, it is by no means complete, also updates are
intermittent. In addition, genre and national complexities make a precise delineation impossible.
There has been plenty of criticism of Moretti’s paradigm, mainly because of the lack of
documentation about the definition of basic terms and genre boundaries in the data used for his
work. Criticism of literary statistics centres on the imperfections of the representations of the
literary fields produced. McCarthy focuses more on practical applications in his book Humanities
Computing (2005). The data modelling approach used by the presenter to explain gender trends is
in response to Moretti’s assertion that gender trends in literature are complementary and based on
different talents in men and women writers. The problem is that the statements are not taking into
account the whole picture, particularly non-canonical literature. The presenter demonstrated that
by using a combination of book historical and digital humanities methodologies, it was possible to
trace the rise of women authors between 1960 and 1980, a trend usually attributed to second-wave
feminism which deconstructed the male canon. On closer inspection though, the largest area of
growth during this period was in the genre of romance fiction, a form of writing usually seen as
inimical to second-wave feminism. Another trend that can be observed is that while women now
account more than half of all Australian novels, critical discussion in newspapers and scholarship has
declined over the same period. The presenter concluded that digital humanities can offer a valuable
contribution to the field of humanities by not only questioning scholarly assumptions, but also by
challenging unqueried institutionalized procedures and systems.
Closing plenary
“Embracing a Distant View of the Digital Humanities”, Masahiro Shimoda, Department of Indian
Philosophy and Buddhist Studies/ Center for Evolving Humanities, Graduate School of Humanities
and Sociology, University of Tokyo, Japan
Humanities research, just like digital humanities research, needs to be aware of its roots and the
relations between its predecessors and successors, as well as relations between languages and
cultures. The relation between cultures in their succession is a key humanistic research interest, and
it is equally applicable to the digital humanities. The humanities have two types of diversity, the
object of study and the method of research. Diversity in the digital humanities is amplified by the
22
inclusion of technology and the multiplicity of associated computational methodologies. Stability in
the digital humanities is one of the major challenges with the technological changes that are
happening rapidly. The discovery of the East in the 18th century as a subject for humanistic research
has coincided with the evolution of some of he principal and long-standing methodological principles
in the humanities generally. Distant perceptions, both spatially but also temporally, have allowed the
humanities to establish a view of Eastern and ancient cultures that is detached from personal views
and experiences. This is a paradigm that fits well with the digital humanistic enquiry. The digital
humanities' technological foundations have an equally distancing effect on the objects of study. The
full-text digitization of one of the major Chinese Buddhist corpora, Taisho Shinshu Daizokyo, with
well over a hundred million characters in 85 volumes has transformed both access to and our
understanding of these texts in their cultural contexts. The digital humanities have a huge
opportunity in the digitisation and analysis of these vast resources and revise some of the longstanding traditional humanistic scholarship that has dominated the discourse for the past 100 years.
A bird's eye view of the many interconnections between these resources is a fascinating way of
approaching the Buddhist textual, cultural and religious heritage. Buddhist scriptures are also
challenging some of the traditional perceptions of “written texts”, the scriptures not only usually
have several titles, but also multiple authorship, some of which can be purely legendary. We need
to go beyond the boundaries of individual texts to corpus analysis to highlight the importance of
connections in traditions of writings materialized in individual scriptures. In a broader perspective,
the traditional humanities will not be succeeded by the digital humanities, but the field will be
enriched and will enrich our understanding of our cultural heritage.
Next year's Digital Humanities conference will be hosted by the University of Nebraska-Lincoln, USA,
DH 2014 will be hosted by the University of Lausanne, Switzerland.
03/08/2012 – Alexander Huber, Digital Collections Development, Bodleian Libraries, University of
Oxford
23
Download