The Development of a Stance Annotation Scheme: Lessons for Computational Linguistics and Sociolinguistic Theory Scott Kiesling1 , Jacob Eisenstein2 , James Fitzpatrick1 , & Umashanthi Pavalanathan2 University of Pittsburgh1 – Georgia Institute of Technology2 Research in computational linguistics has increasingly turned to issues of interpersonal meaning, such as sentiment (Pang and Lee 2008), emotion/affect (Tausczik and Pennebaker 2010), and politeness (DanescuNiculescu-Mizil et al. 2013). In sociolinguistics and linguistic anthropology, a focus on interpersonal meaning is far from new (Brown and Gilman 1960) but more recently has been conceptualized in terms of stancetaking. Our partnership of computer scientists and sociolinguists is working towards the use of stance in corpus analytic investigations. In this paper, we report on efforts to create a corpus of online discourse annotated for stance, following a version of Kiesling’s (2011) schema. Our corpus is composed of a subsample of posts to the popular internet discussion site Reddit. We originally sampled a set of ‘subreddits’ to achieve a topically-balanced corpus: fitness, parenting, metal (music), and the city-specific subreddits for Pittsburgh and Atlanta. Each subreddit contains a set of threaded discussions, in which each new post may respond either to the original post (e.g., a question, claim, or link), or to other posts in the thread. For each post, we had three or four annotators first determine the stance focus, and then annotate it on a five-point scale for affect (attitude towards stance object), alignment (attitude towards the interlocutor), and investment (attitude towards the talk itself). Inter-rater agreement was computed using Krippendorf’s alpha (Hayes and Krippendorf, 2007), and disagreements were discussed, leading to refinements in the annotation guidelines. Given this training and development regime, we expected agreement to increase over time, but it was relatively flat, at α = 0.6 for affect and α = 0.3 for the other two attributes. Because it seemed that different issues caused the disagreements with each new conversation, we hypothesized a speech activity effect: for example, in the parenting and local city subreddits, people tended to ask questions or give advice and help, while in the metal and fitness subreddits, there was more argument and sarcasm. The Research in computational linguistics has increasingly turned to issues of interpersonal meaning, such as sentiment (Pang and Lee 2008), emotion/affect (Tausczik and Pennebaker 2010), and politeness (Danescu-Niculescu-Mizil et al. 2013). In sociolinguistics and linguistic anthropology, a focus on interpersonal meaning is far from new (Brown and Gilman 1960) but more recently has been conceptualized in terms of stancetaking. Our partnership of computer scientists and sociolinguists is working towards the use of stance in corpus analytic investigations. We therefore moved to annotating a single subreddit over ten threads, focusing on the “Explain Like I’m Five (ELI5)” subreddit, where the speech activity is more focused. Preliminary data suggest that the repetition of speech activity has indeed increased agreement. This result suggests that stance is tightly related to the speech activity, which should therefore be better integrated into models of interpersonal stance. References Brown, R., & Gilman, A. (1960). The pronouns of power and solidarity. In T. Sebeok (Ed.), Style in Language, 253276. Boston: Brown R. & Gilman (1960). The Pronouns of power and solidarity. MIT Press. Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors. In Proceedings of the Association for Computational Linguistics. Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77-89. Kiesling, S. F. (2011). Stance in context: Affect, alignment and investment in the analysis of stancetaking. Presented at the iMean conference, 15 April 2011. The University of the West of England, Bristol, UK. Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1135. Tausczi & Pennebanker. (2010). Journal of Language and Social Psychology. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology 29(1) 24-54.