Identifying speech acts in emails

Identifying speech acts in emails: Business English and non-native speakers In this work, we present an approach to the automated identification of speech acts in a corpus of workplace emails. As well as contributing to ongoing research in automated processing of emails for task identification (see e.g. (Cohen, Carvalho et al. 2004; Carvalho and Cohen 2006; Lampert, Dale et al. 2006; Lampert, Dale et al. 2008; Mildinhall and Noyes 2008)), the main goal of this work is to determine whether learners of English are using the appropriate structures for this type of text. In particular, tests of English for the business environment often require learners to write emails responding to specific task requirements, such as asking for information, or agreeing to a proposal, which can be assigned to basic speech act categories. Automated identification of such items can be used in the scoring of these open-content answers, for example noting whether all the speech acts required by the task are present in the students’ texts. To achieve this goal, we are working with a corpus of over 1000 such emails written by L2 English students in response to a test prompt. Each email is manually annotated at the sentence level for a speech act, using a scheme adapted from the Verbal Response Mode framework (Stiles 1992). This includes categories such as Acknowledgement, Question, Advisement (requesting the recipient do something), Disclosure (of personal thoughts or intentions), and Edification (stating objective information). We then identify a set of features designed to discriminate among the categories. Similar methods have been proven to be successful in acquiring usage patterns for lexical items such as prepositions and determiners (De Felice and Pulman 2008; Gamon, Gao et al. 2008; Tetreault and Chodorow 2008); one of the novel aspects of this work is to extend the classifier approach to aspects of the text above the lexical level. The features refer to several aspects of the sentence: punctuation, use of pronouns, use of modal verbs, presence of particular lexical items, and so on. A machine learning classifier is trained to associate particular combinations of features to a given speech act category, to automatically assign a speech act label to novel sentences. The success of this approach is tested by using ten-fold cross-validation on the L2 email set. Experiments are still ongoing, but full results will be ready by the time of the conference. To establish whether this method is specific to L2 texts, or can also be applied to L1 texts in a similar domain, we also intend to assess the performance of this classifier on L1 data, specifically a subset of the freely available Enron email corpus (Klimt and Yang 2004). Our work combines methods from corpus linguistics and natural language processing to offer new perspectives on both speech act identification in emails and the wider domain of automated scoring of open-content answers. Additionally, analysis of the annotated learner data can offer further insights into the lexicon and phraseology used by L2 English writers in this text type. References: Carvalho, V. and W. Cohen (2006). Improving email speech act analysis via n-gram selection. HLT-NAACL ACTS Workshop. New York City. Cohen, W., V. Carvalho, et al. (2004). Learning to classify email into "speech acts". EMNLP. Barcelona, Spain. De Felice, R. and S. Pulman (2008). A classifier-based approach to preposition and determiner error correction in L2 English. COLING. Manchester, UK. Gamon, M., J. Gao, et al. (2008). Using contextual speller techniques and language modeling for ESL error correction. IJCNLP. Klimt, B. and Y. Yang (2004). The Enron corpus: a new dataset for email classification research. European Conference on Machine Learning (ECML): 217-226. Lampert, A., R. Dale, et al. (2006). Classifying speech acts using verbal response modes. Australasian Language Technology Workshop (ALTW). Sydney, Australia: 3441. Lampert, A., R. Dale, et al. (2008). The nature of requests and commitments in email messages. AAAI Workshop on Enhanced Messaging. Chicago, USA: 42-47. Mildinhall, J. and J. Noyes (2008). Toward a stochastic speech act model of email behavior. Conference on Email and Anti-Spam (CEAS). Stiles, W. (1992). Describing Talk: a taxonomy of verbal response modes. Thousand Oaks, CA, Sage. Tetreault, J. and M. Chodorow (2008). The ups and downs of preposition error detection. COLING. Manchester, UK.

Identifying speech acts in emails

Related documents

Products

Support

Identifying speech acts in emails

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib