The Trouble with House Elves Experiments in Computational Folkloristics Timothy R. Tangherlini

advertisement
The Trouble with House
Elves
Experiments in Computational Folkloristics
Timothy R. Tangherlini
2
A story…
It was the old counselor from Skårupgård who came
riding with four headless horses to Todbjærg church. He
always drove out of the northern gate, and there by the
gate was a stall, they could never keep that stall door
closed. They had a farmhand who closed it once after it
had sprung open. But one night, after he'd gone to bed,
something came after the farmhand and it lifted his bed
straight up to the rafters and crushed him quite hard.
Then the farmhand shouted and asked them to stop
lifting him up there. "No, you've tormented us, but now
you'll die..." I heard that's how two farmhands were
crushed to death. He wanted to close the door and then
they never tried to close it again.
3
• Some standard questions:
– Role of ghosts in late 19th century Denmark?
– Origins of the story?
– Structure of the story?
– Who, what, where of this story?
• Is there a need for a computational
folkloristics?
4
Folklore
• Early history of the discipline:
– Philological
– National Romanticism
• Johann Gottfied Herder (1744-1803)
– Wilhelm (1786-1859) and Jacob (1785-1863) Grimm
– Search for original forms
5
Romantic Nationalism
in the Nordic lands
•
Asbjørnsen and Moe (Norway)
–
•
•
•
Development of the Norwegian language
Linnaeus and the rush to categorize (Sweden)
The ballad, archaeology and Svend Grundtvig
(Denmark)
The Kalevala and folklore as a science
(Finland)
6
Mapping
Folklore
• Historic-geographic method
– Kaarle and Julius Krohn (1906-1924)
• Focused work on the Finnish epic, Kalevala
• Led to the type index of folk literature (Antti Aarne)
– Ripples on a pond theory of folklore diffusion
7
Maps in the study of culture
• Geography is not an inert container, is not a box
where cultural history "happens," but an active
force, that pervades the literary field and shapes
its depth. Making the connection between
geography and literature explicit... will allow us
to see some significant relationships that have
so far escaped us
– (Moretti 1998, 3).
8
A New Historic-Geographic Method
• Folklore as a process:
– in time and space
– emerges from the dialectic between
individuals and tradition
• Maps can help model relationships
between:
– People
– Environment
– Folk Repertoires
9
Study Corpus
• Evald Tang Kristensen (1843-1929)
– Actively collected from 1865-1923
– 219 collecting trips
• 6500+ named informants
• 24,000 manuscript pages
• 250,000 published expressions
10
A multi-level folklore browser
• People
• Places
• Stories
11
Experiments in mapping
1.
Mapping collecting routes
•
2.
Challenge Question: Did Tang Kristensen’s published
statements about his collecting accurately reflect his collecting
work?
Mapping individual repertoire distribution
•
•
3.
CQ: Does individual mobility influence the range of places
mentioned in stories?
CQ: Do other informant features, such as gender, influence
range of places mentioned?
Mapping by story features against individual repertoire
•
CQ: Are there patterns, ala Moretti, that become apparent in
the visualization of stories by repertoire, genre and/or story
topic?
12
Experiment 1:
Mapping Collecting Routes
– ETK presents himself as a West Jutlander
– Political motivations
• Aftermath of Napoleonic wars and Danish
bankruptcy (1814)
• Loss of Schleswig to Bismarck (1865)
• Urbanization
– Search for “authentic” Danish culture
– What do the collecting routes reveal?
13
Experiments 2 & 3:
Mapping Repertoire
• Theory: Individual biography influences repertoire and its
features
• Hypothesis: Classes of individuals have different
degrees of physical mobility, and this is reflected in their
storytelling
• Hope: Maps reveal interesting patterns of placesmentioned
– A Caveat: My main interest, and the vast majority of the
collection, are based on legends, stories that refract the lived
environments and social organization of the tradition participants
14
Experiment 2:
Place Name Distribution and Mobility
– Target: repertoires of 5 storytellers
– Limit: only stories that mention places
– Method
• Plot place names mentioned by storyteller
• Calculate Standard Deviation Ellipse distribution
patterns for places mentioned in storyteller
repertoires
• Look for patterns in the underlying place name
distribution
15
Experiment 3:
Can unsupervised learning on text help in pattern discovery?
16
Experiment 3:
Unsupervised learning and Repertoire clusters
– Target: repertoires of 5 storytellers
– Limit: only stories that mention places
– Method
• Convert stories to TFIDF vector representations
• Force dimensionality reduction using SVD
• Cluster: ECM by storyteller
– eliminate small clusters
• Project results into GIS
• Calculate distribution ellipses for each cluster in
each person’s repertoire
17
A Crisis…
• Maps were informative since new patterns in the
geographic distribution of stories were
discovered…
– why hadn’t I known about these patterns before?
• What other types of patterns, some very small,
some very large are lurking in the data?
• How can I be sure that my selection of examples
is representative or even accurate?
18
A Classic Folklore Problem
• Classification in folklore
– 1 text = 1 classifier
• What happens when the classifier was designed
for a different research problem?
• Are we missing patterns that are not solely related
to single topic classifiers?
• Are we missing stories in our searches because of
these single topic classifiers?
• Does this limit our ability to work with a large
archive?
19
Current folklore classifiers are very expensive
20
A lost story…
It was the old counselor from Skårupgård who came
riding with four headless horses to Todbjærg church. He
always drove out of the northern gate, and there by the
gate was a stall, they could never keep that stall door
closed. They had a farmhand who closed it once after it
had sprung open. But one night, after he'd gone to bed,
something came after the farmhand and it lifted his bed
straight up to the rafters and crushed him quite hard.
Then the farmhand shouted and asked them to stop
lifting him up there. "No, you've tormented us, but now
you'll die..." I heard that's how two farmhands were
crushed to death. He wanted to close the door and then
they never tried to close it again.
21
Networks to the rescue?
• Folklore as traditional communication across
social networks
– Folklore networks
• Social networks of tradition participants
• Networks of scholars and collectors
• Networks of stories
– External networks
• Communications networks
• Transportation networks
• Affiliation networks
– Internal networks
• Linguistic networks
22
Connecting the dots…
P1
P7
I3
P8
S2
S1
I1
P2
I3
I2
P4
P5
P3
S3
S4
P6
I3
23
Storyteller networks
• Local networks
• Connect all storytellers in a given parish
• Connect all storytellers in a family
• Fieldtrip networks
• Connect all storytellers on a given fieldtrip
• Collector-Storyteller networks
• Connect all storytellers to all collectors with whom they
worked
• Inferred / Affiliation networks
• Connect storytellers by work groups (eg millers, fiddlers, etc)
• Connect storytellers by other affiliations (eg gender, age,
education)
24
Story networks
• Connect stories to:
– People:
• storytellers
• people mentioned
– Places
• places collected
• places mentioned
– Each other
•
•
•
•
By shared indexing
By shared keyword (keyword extraction)
By shared topic (topic modeling using LDA)
By shallow ontology (tango index)
25
An initial graph of the ETK study corpus
26
Lost in a thicket of stories, keywords, etc
27
Folklore Spaghetti
28
Graph clustering
• Use a tuned version of MCL clustering for
graphs
– iteratively generates stochastic matrices, also
known as Markov matrices (van Dongen
2000)
– 2973 nodes / 52663 edges
29
Structure emerges and the graph becomes useful
30
Remember our ghost story?
• DS IV 650
• Classified as a story about manor lords,
not ghosts!
• Impossible to find in the archive
• Can I use networks to find this story?
• Will it help me find other stories of
interest?
31
32
33
Almost all the surrounding stories
are cataloged as ghost stories!
34
DS II B 147 is a story of interest—not a ghost story but
strongly connected to DS IV 650…
35
DS II B 147
• A story about a house elf at a farm in Egå...
• Ends as follows:
– When they got home, the farmhand was happy
because now he’d gotten something to use for feed,
and afterward nis could go and feed just as much as
he wanted to. Then they got another farmhand, and
he didn’t want to let him go on like that. But he got
lifted up in his bed and all the way up to the rafters, so
he lay there dead when people got up the next
morning.
36
The trouble with house elves…
• You can’t always find them…
• They act in unpredictable ways…
• The things they do turn out to be pretty
mean and nasty
37
New research question
• What is the relationship between ghosts and
house elves in 19th century Denmark and why
might there be such a relationship?
38
Some tentative conclusions
39
Directions for future work
• Labeling
– Can we automatically label nodes given a sparsely
labeled graph? (LDA-G, Homophily algorithms)
• Anomaly detection / Community detection
– Can we automatically find “stories of interest” on our
graph?
• Multimodal networks
– Integrate network information from several networks
• Dynamic networks
– Understanding how network changes over time
• Geographic visualization of network models
40
A Very Special Thanks to
• IPAM
– Peter Jones
– Mark Green
– Russ Caflisch
• Colleagues and friends from Search
Engines 2007
41
Additional thanks to
•
•
•
•
•
Peter Broadwell, UCLA
James Abello, DIMACS
Tina Eliassi-Rad, LLNL/Rutgers
Nischal Devanur, Rutgers
UCLA’s Center for Digital Humanities
42
Funded by…
– Nordic Council of Ministers
– The American Council of Learned Societies
– NSF Eager Grant IIS- 0970179
• With Lancaster (ECAI), Buckland (ECAI), EliassiRad (Rutgers) and Faloutsos (CMU)
– Google Books Humanities grant
– Many ideas derived from
• NEH Institute for Advanced Topics in Digital
Humanities, “Networks and Network Analysis for
the Humanities” (NEH HT5001609)
43
Download