middle_distance - UC Berkeley School of Information

advertisement
Analyzing Text at the
Middle Distance
between the Close Read and Culturomics
Marti A. Hearst
U.C Berkeley
Joint Work with Aditi Muralidharan
Background: Culturomics (Text Mining)
Middle Distance: Sensemaking
Foreground: The Close Read
Definition: “Close Read”
“Close reading describes, in literary criticism, the
careful, sustained interpretation of a brief passage of
text. Such a reading places great emphasis on the
particular over the general, paying close attention to
individual words, syntax, and the order in which
sentences and ideas unfold as they are read.”
-English Wikipedia, 6/4/2012
“Power and Passion in Shakespeare’s Pronouns
Interrogating ‘you’ and ‘thou’”
Penelope Freedman, 2007, MPG Books, 280 pp.
Scene from “As you like it” by Daniel Maclise (1806-70)
Conclusions (“Power and Passion of
Shakespeare’s Pronouns”)
“The subtleties of the use of ‘you’ and ‘thou’ that have
emerged … can seem, at worst, random or, at best,
unfathomable. …
A set of oppositions has been revealed here: … These
oppositions are complex and slippery: they may operate
in parallel, may converge or diverge. Each pronoun
choice has to be seen in a highly specific context.”
Definition: “Culturomics”
Narrower than “digital humanities” and
broader than “corpus linguistics”.
( Loose interpretation of definitions at culturomics.org )
“Culturomics” example:
middle distance vs. middle ground
As an NLP Researcher, where do your ideas come from?
Can HCI improve your work?
Sensemaking
• A vague information need
• Iteratively refine it by
• Searching
• Reading
• Analyzing
• Reach understanding
Pirolli and Card 2005, Pirolli and Russell 2011
Sensemaking for Literature Study
WordSeer (version 1)
The North American Pre-civil-war Slave Narratives
The North American Slave Narratives
• Stories of the lives of former slaves
• Published by white abolitionist sponsors
• About 3000 narratives survive
• ~300 in prototype
Do the north american slave
narratives all conform to the same
stereotypes?
A “Master Plan” for the slave narratives
“... conventions so early and firmly established that one
can imagine a sort of master outline drawn from the
great narratives and guiding the lesser ones”
-- Olney, J. “I was born: Slave Narratives and their Status as
Autobiography”, Callaloo, 1984
Our approach
• Phase 1: Support searching for instances of conventions
• Phase 2: Support visualizing their occurrence in the collection
Searching for stereotypes
• Keyword search is not enough
• Search words: “cruel” “harsh” “overseer” “master” “mistress”
• Instead: “overseer” “master”, “mistress” described as “cruel”, “harsh”
• Also want the entire picture, for comparison
• “overseer” “master”, “mistress” described as ____?_____
• ___?_____ described as cruel
Natural language processing
modifier
object
The cruel overseer beat us severely.
subject
(automatically-extracted structure)
Grammatical search
Part 2: visualizing stereotypes
• Prevalence
• Position of occurrence within a document
• Across the entire collection
“I was born”
Results (presented at MLA 2012)
• Prevalent stereotypes
• “I was born”
• Separation from parents
• Cruel treatment
• Escape
• A ‘missed’ stereotype
• Parents’ death
• Not as strictly ordered as implied by Olney’s master plan.
Problems
• Vocabulary
• Same concept expressed with many different wordings
• Needed to see synonyms, nearby words, suggestions on searches
• Comparison and curation
• Couldn’t isolate and compare results on sub-collections of document
WordSeer (version 1.5)
wordseer.berkeley.edu
The complete works of Shakespeare
• 42 documents -- plays and sonnet collections
• 1589 -- 1612
Analyze Hamlet.
English 203:Hamlet in the Humanities Lab
Spring 2012, University of Calgary
How does the portrayal of men and
women in Shakespeare change in
different circumstances?
(CHI ’12 works in progress)
The Vocabulary Problem
Which words embody the concept of female
beauty?
261 results
Collection and Comparison
Does the treatment of love vary between the
comedies and tragedies?
Collection and Comparison
Step 2. Compare word usage
comedies
tragedies
“in love”
comedies
tragedies
Results
• WordSeer 1.5 being successfully used (so far) in Hamlet class
• How does the relationship between Hamlet and his mother change over the
course of the play?
• How does Act 1 portray the character of Horatio?
• Investigated changing language use around men and women
• Unknowingly replicated and extended previous findings by other
Shakespeare scholar
How Does This Apply to Social Media Language?
As an NLP Researcher, where do your ideas come from?
Can HCI improve your work?
Sentiment Analysis?
Sarcasm?
Summary
• We suggest enhancing NLP research with sensemaking tools to help with
hypothesis formation
• Midway between reading the text and blind statistics.
• Helps with hypothesis formulation, verification, and refinement.
• This is clearly useful for literature analysis.
• It remains to be seen if it can help with social media analysis.
Thank you!
Download