The Development of a Stance Annotation Scheme:

advertisement
The Development of a Stance Annotation Scheme:
Lessons for Computational Linguistics and Sociolinguistic Theory
Scott Kiesling1 , Jacob Eisenstein2 , James Fitzpatrick1 , & Umashanthi Pavalanathan2
University of Pittsburgh1 – Georgia Institute of Technology2
Research in computational linguistics has increasingly turned to issues of interpersonal meaning, such as
sentiment (Pang and Lee 2008), emotion/affect (Tausczik and Pennebaker 2010), and politeness (DanescuNiculescu-Mizil et al. 2013). In sociolinguistics and linguistic anthropology, a focus on interpersonal meaning
is far from new (Brown and Gilman 1960) but more recently has been conceptualized in terms of stancetaking.
Our partnership of computer scientists and sociolinguists is working towards the use of stance in corpus
analytic investigations.
In this paper, we report on efforts to create a corpus of online discourse annotated for stance, following a
version of Kiesling’s (2011) schema. Our corpus is composed of a subsample of posts to the popular internet
discussion site Reddit. We originally sampled a set of ‘subreddits’ to achieve a topically-balanced corpus:
fitness, parenting, metal (music), and the city-specific subreddits for Pittsburgh and Atlanta. Each subreddit
contains a set of threaded discussions, in which each new post may respond either to the original post (e.g.,
a question, claim, or link), or to other posts in the thread. For each post, we had three or four annotators
first determine the stance focus, and then annotate it on a five-point scale for affect (attitude towards stance
object), alignment (attitude towards the interlocutor), and investment (attitude towards the talk itself).
Inter-rater agreement was computed using Krippendorf’s alpha (Hayes and Krippendorf, 2007), and
disagreements were discussed, leading to refinements in the annotation guidelines. Given this training and
development regime, we expected agreement to increase over time, but it was relatively flat, at α = 0.6
for affect and α = 0.3 for the other two attributes. Because it seemed that different issues caused the
disagreements with each new conversation, we hypothesized a speech activity effect: for example, in the
parenting and local city subreddits, people tended to ask questions or give advice and help, while in the metal
and fitness subreddits, there was more argument and sarcasm. The Research in computational linguistics has
increasingly turned to issues of interpersonal meaning, such as sentiment (Pang and Lee 2008), emotion/affect
(Tausczik and Pennebaker 2010), and politeness (Danescu-Niculescu-Mizil et al. 2013). In sociolinguistics
and linguistic anthropology, a focus on interpersonal meaning is far from new (Brown and Gilman 1960) but
more recently has been conceptualized in terms of stancetaking. Our partnership of computer scientists and
sociolinguists is working towards the use of stance in corpus analytic investigations.
We therefore moved to annotating a single subreddit over ten threads, focusing on the “Explain Like
I’m Five (ELI5)” subreddit, where the speech activity is more focused. Preliminary data suggest that the
repetition of speech activity has indeed increased agreement. This result suggests that stance is tightly related
to the speech activity, which should therefore be better integrated into models of interpersonal stance.
References
Brown, R., & Gilman, A. (1960). The pronouns of power and solidarity. In T. Sebeok (Ed.), Style in
Language, 253276. Boston: Brown R. & Gilman (1960). The Pronouns of power and solidarity.
MIT Press.
Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational
approach to politeness with application to social factors. In Proceedings of the Association for
Computational Linguistics.
Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding
data. Communication Methods and Measures, 1(1), 77-89.
Kiesling, S. F. (2011). Stance in context: Affect, alignment and investment in the analysis of stancetaking.
Presented at the iMean conference, 15 April 2011. The University of the West of England, Bristol,
UK.
Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends in Information
Retrieval, 2(1-2), 1135.
Tausczi & Pennebanker. (2010). Journal of Language and Social Psychology. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social
Psychology 29(1) 24-54.
Download