- Figshare

advertisement
1
In this video I describe how I built a network graph to visualize how declarations of unknowing
appear in Old English texts. I want to break the process down into five steps:
1.
2.
3.
4.
5.
I searched for these declarations in the Dictionary of Old English Corpus;
I created and analyzed a small data set of passages;
I came up with a way of representing these passages as a network graph;
I then translated the data to a form Gephi can understand;
and I prepared the visualization in Gephi.
Part 1: Mining the Dictionary of Old English Corpus
This article begins with the Dictionary of Old English Corpus, the resource that underlies my
research. The Dictionary of Old English Corpus contains at least one version of every known
Old English text. In my project, I used its latest, most powerfully searchable, not yet released
2014 version to search for the combinations of words that mark out declarations of unknowing.
As I note in my paper, I searched the DOE Corpus for the words “nis (n)ænig” (there is no one),
which mark out the declarations of unknowing that I am looking for. The search returns all
citations from the Old English corpus in which the two words appear—in any order.
2. Building the Data Set
Then I analyzed each of these passages in context to determine if the passages contained
declarations of unknowing and if so, what topic the declarations of unknowing referred to.
Given the small size of the data set, it was enough to use an Excel file rather than a full-fledged
SQL database to capture it.
(show Excel file)
3. Modelling the Passages as a Network Graph
Before even turning to Gephi (or any other network visualization tool), I had to think how I
would represent my data as a network graph.
A network graph consists of a set of things, called nodes or vertices, pairwise connected by
relationships called edges. Network graphs are extremely versatile. In humanities research
specifically, network graphs can be used for almost anything. They can represent relationships
between characters (this network visualization of Les Miserables tracks interactions between
characters in the novel); or correspondence among a group of intellectuals who exchanged letters
over a period of time (this network visualization tracks letters between Enlightenment-era writers
and thinkers); or networks of sale and purchase of medieval manuscripts
(http://mappingbooks.blogspot.ca/2014/01/charting-former-owners-of-penns-codex.html); or—in
my case—the distribution of Old English formulaic phrases that occur within a group of Old
English texts. In short, you can use network graphs to visualize anything, as long as that thing
can be imagined as a network of relationships.
2
Whatever your research, then, your first step is the conceptual framework: within your data,
figure out what you want to model as things and what you want to model as relationships; in
network terms, what you want to model as nodes (points in your network) and what you model
as edges (linking lines in your network). At this point it might actually be helpful to sketch out a
tiny part of your data on paper, as an easy way of prototyping your graph.
In my visualization, I decided to model the declarations of unknowing themselves as
relationships between texts and topics (the latter usually supernatural). That is, the texts and
topics were the nodes; the declarations of unknowing were the lines. This model is in keeping
with the notion of traditional referentiality: in a tradition-based poetics, motifs exist not just in
themselves or in the texts where they appear; they exist and carry with them the context of the
entire tradition. I model the declarations as relationships because each of them mediates between
internal and external context, between the textual moment where it occurs and the wider tradition
it recalls.
Part 3: Data for Gephi
Once I knew what shape I wanted my graph to take, I put my data in a form understandable to
Gephi, the free, open-source network graphing software I am using.
First, I downloaded Gephi and installed it on my machine. (Unlike, say, Google Fusion Tables,
this is not an in-browser application.)
Then, based on my dataset, I created two spreadsheets: Nodes and Edges.
The first spreadsheet is Nodes. Nodes contains the texts and topics. The spreadsheet has three
columns: first, a unique id number; second, a label for the text or topic (a short title for the text,
e.g. the DOE Corpus short title “Sat” for the Old English poem Christ and Satan, or a short topic
description, e.g. “Doomsday”); third, a field that says either “Text” or “Topic” indicating if the
node in question is a text or a topic.
The second spreadsheet is Edges. Edges contains a list of the connections between nodes: an
entry on the same row with a “source” node and a “target” node means that there’s a line
between those two nodes. That is, if you have “61” under source and “2” under target means that
there is a declaration of unknowing in text 62 (Blickling Homily 5) referring to topic 2 (God &
Heaven).
You can generate these two files manually, which is really tedious. Or you can generate them by
running some sql queries on the larger Excel file containing your data.
However you generate them, make sure you save them as .csv (comma-delimited).
All right: now you have an idea of what you want your graph to look like; you have your data;
and you have Gephi on your computer. Your next step is to get your data into Gephi and
actually generate your network graph.
3
Part 4: Make the Graph
To import your data, open up Gephi. Select New Project data Laboratory. Select Import
Spreadsheet. First select your Nodes file and make sure you designate it as Nodes table on the
import screen.
Once you successfully imported the Nodes, add the Edges. Again, Select Import Spreadsheet.
Select your Edges file and make sure you designate it as the Edges table on the import screen.
Now, time to see and edit the resulting graph.
There are two kinds of edits you can make: structural (such as picking between different
network representation algorithms) and appearance-related (such as making small tweaks to the
representation of your network). Given the nature of my project—a small dataset, un-weighted
edges, data firmly shaped by close reading—the changes I made were appearance-related. Thus
the appearance of the graph is not entirely driven by the data; it is driven also by my
interpretation of the data.
1. I centred the graph
2. I showed node labels (i.e. short titles as per DOE Corpus for the texts and topic names for
the topics)
3. I dragged nodes into maximum visibility positions, with texts above and topics below
4. I sized nodes to correspond to the number of declarations of unknowing present in each
text
5. I decided on a colour scheme for the nodes
6. I tweaked font size
Important to note: Gephi has no undo button. But you can save multiple versions of your
project, so you can revert to an earlier version if you make any colossal mistakes along the way.
Despite some challenges with the documentation, what I liked about working with Gephi is that
it let me model, visualize, and examine my data systematically while at the same time allowing
me to handcraft the visualization. So the graph, as I noted earlier, is of course created from the
textual data but not wholly driven by it: it is also very much shaped by my own interpretive and
presentation decisions. This is in keeping with my own approach, which balances distant reading
of a large corpus and the wider tradition with focused close readings of individual texts.
Download