Corpus linguistics and stylistics Stylistics “Stylistics is the study of literature using methods, theories and concepts from linguistics” (Leech and Short 2007:1), it is "[...] the study of the relationship between linguistic form and literary function [...]” (Leech and Short 2007:3) Style Style is an idiosyncratic way of performing a particular action. It can be non-linguistic or it can be linguistic. “Style is a way in which language is used [...] style consists in choices made from the repertoire of the language [...] Stylistic choice is limited to those aspects of linguistic choice which concern alternative ways of rendering the same subject matter” (Leech and Short 2007:31) “[...] the recognition and analysis of styles are squarely based on comparison.” (Enkvist 1973: 25-26) “We match the text against another body of texts which we might label as norm, this norm being chosen because it is contextually relevant as a background for the text. […] Features whose densities are significantly different in the text and in the norm are style markers for the text in relation to the norm used. A change of norm may result in a different inventory of style markers. The norm may be chosen from a wide field. One portion of a text may be matched against other portions or the whole of the same text. One text may be compared to other texts. Or the text may be set against an imaginary norm that only exists in a critic’s mind” (Enkvist 1973: 25-26) Corpus linguistics + stylistics = corpus stylistics The advent of corpora, and the availability of a range of corpus-based techniques, have opened up new avenues in the study of literature, and prose fiction in particular. The ‘corpus turn’ (Leech and Short 2007:284) refers to the on-going trend in stylistics to use methods and tools from corpus-linguistics for the analysis of literary and other texts. Usually referred to as corpus-stylistics Intra-textual vs. inter-textual approaches to electronic text analysis (Adolphs 2006): Intra-textual: analysis of an individual text (e.g. via concordances, collocations, etc.) Inter-textual: comparison of an individual text with other related texts or with a larger corpus 1 Corpus approaches and genre style Biber (1988) – multivariate statistical techniques factor analysis; many different variables; variables = linguistic features (e.g. passive constructions) e.g. Narrative versus non-narrative texts important variables = past tense verbs, 3rd person pronouns, perfect aspect, present participle clauses; High scores = narrative; Low scores = non-narrative Corpus approaches and authorial style Studies attempting to ‘fingerprint’ authors: i.e. to identify linguistic items that distinguish the works by one author from those of others. Burrows (1987) study of Jane Austen’s novels focusing on closed-class words, such as the, and, of, a and to. Burrows found that these words can distinguish the works of different authors, different novels, and even the words spoken by different characters. Hoover (2002) studied a series of corpora containing chunks from novels by different authors. The distribution of the 300 most frequent words in the corpus correctly clusters 15 out of 17 novels. Corpus approaches and text style Stubbs’s (2005) study of Joseph Conrad’s Heart of Darkness. Comparison between word frequencies in Heart of Darkness and in the ‘Imaginative Writing’ section of the BNC Connection between key-words and Enkvist’s ‘style markers’ Stubbs noticed words expressing vagueness and uncertainty were more frequent in Heart of Darkness: ‘as though’, ‘seemed’, ‘kind of’, ‘sort of’. Stubbs shows how the application of corpus methods can provide further justification for wellestablished interpretations, and new insights into the language and meaning potential of the text. Corpus approaches and variation inside texts Culpeper (2002) used WordSmith Tools to do a key-word analysis of the speech of the main characters in Romeo and Juliet 2 A file with the words spoken by each character was compared to a ‘reference corpus’ containing the words of all the other characters. Findings are relevant to an understanding of how the characters are linguistically constructed (characterisation). Culpeper, J. (2009) Revisited analysis, using WMatrix Corpus tools WordSmith Tools (Scott 2007) http://www.lexically.net/wordsmith/ AntConc (Anthony 2010) http://www.antlab.sci.waseda.ac.jp/software.html WMatrix (Rayson 2003, 2008) http://ucrel.lancs.ac.uk/wmatrix/ Multilingual Corpus Toolkit (MLCT) https://sites.google.com/site/scottpiaosite/software/mlct (Piao) WMatrix– web-based environment – processes texts in three ways: 1) Word level – simple word frequency lists. 2) Grammatical level – CLAWS 1(Constituent Likelihood Automatic Word-tagging System - see Garside 1987; Leech, Garside and Bryant 1994; Garside 1996; and Garside and Smith 1997) adds parts of speech (POS) tags (with 96-97% accuracy) to the words in the text. 3) Semantic level – The USAS (UCREL2 Semantic Analysis System) tagger assigned tags to the words (with 92% accuracy) from a set of predefined semantic fields. Currently there are 21 major fields (see Table 1), which are subdivided into 232 category labels to allow for more finegrained classification. (More info about semantic categories at http://ucrel.lancs.ac.uk/usas/) Keyness – A concept made popular by Mike Scott and Wordsmith Tools (Scott 1996, 1999, 2007) WordSmith, AntConc and WMatrix all have a ‘Keyword’ facility that compares the word-list from one text with the word-list from another text, or reference corpus, to produce a list of words that are statistically ‘over-used’ (or ‘under-used’, depending on the settings selected) in the first text. WMatrix extends the notion of keyness as it can not only compare texts at the word level (key words), but also at the grammatical level (key POS), and the semantic level (key concepts). WMatrix uses Log-likelihood 3 to calculate keyness. 1 See http://ucrel.lancs.ac.uk/claws/ for further information. 2 University Centre for Computer Corpus Research on Language 3 log-likelihood info and wizard: http://ucrel.lancs.ac.uk/llwizard.html 3 Conclusions The notion of ‘style’ is fundamentally based on comparison Corpus-based methods are relevant to the analysis of style in fiction/literature, and can also play a role in explaining and refining interpretations of texts. A variety of corpus methods have been applied to the analysis of genres, authors and texts. Corpus tools can be used for the analysis of the ‘content’ or themes of texts, and for the analysis of some aspects of ‘style’. However, manual analysis and interpretation of the output of the tool are needed. [...] ‘corpus stylistics’ is not purely a quantitative study of literature. Rather, it is still a qualitative stylistic approach to the study of language of literature, combined with or supported by corpusbased quantitative methods and technology. Ho (2011:10) References Adolph, S. (2006) Introducing Electronic Text Analysis. London: Routledge. Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press. Burrows, J. (1987) Computation into Criticism: A Study of Jane Austen's Novels, and an Experiment in Method, Oxfrod: Oxford University Press. Culpeper, J. (2002) ‘Computers, language and characterisation: An Analysis of six characters in Romeo and Juliet’. In: U. Melander-Marttala, C. Ostman and Merja Kyto (eds.), Conversation in Life and in Literature: Papers from the ASLA Symposium, Association Suedoise de Linguistique Appliquee (ASLA), 15. Universitetstryckeriet: Uppsala, pp.11-30. Enkvist, N. E. (1973) Linguistic Stylistics. The Hague: Mouton. Ho, Y. (2011) Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles’ The Magus. London: Continuum Hoover, D. L. (2002) ‘Frequent word sequences and statistical analysis’. Literary and Linguistic Computing, 17, 2, 157-80. Leech, G. N. and Short, M. (2007) Style in Fiction (2nd ed). London: Longman. Stubbs, M. (2005) ‘Conrad in the computer: examples of quantitative stylistic methods’. Language and Literature, 14, 1, 5-24. 4