Corpus Stylistics

Corpus Stylistics Outline: • Background and introduction to current work • Methodology in Corpus Stylistics • Applications of Corpus Stylistics • References Corpus Stylistics Background: What is Corpus Stylistics? • The statistical study of style, i.e. study of the relative frequency of elements in a text – Augustus de Morgan, 1851: disputes about the authenticity of some of the writings of St Paul settled by the measurement of the length of the words used in the various Epistles – T.C. Mendenhall, 1887: analysis of several authors’ frequency distributions of word-length Corpus Stylistics • Corpus: a body or collection of linguistic data for use in research • Since the early 1960s: interest in computer corpora or machine readable corpora • Statements about the relative frequency of various linguistic items in a corpus have become very accurate Corpus Stylistics • Some uses of statistical analysis of style through corpora: – Education, e.g. EFL textbook writing – Establishment of authorship, e.g. of unascribed manuscripts – Interpretive stylistics, e.g. study of the writer’s ideology and point of view Corpus Stylistics Methodology • Simple things may characterise different styles – average sentence length – average word length – type:token ratio (vocabulary richness) • number of types = number of different words • number of tokens = total number of words – vocabulary growth (homogeneity of text) • number of new types in 1st, 2nd, …, nth 1000 words • in rich varied text, number will climb steadily • Especially when used comparatively Corpus Stylistics Methodology (cont’d) • More complex analyses can give a more interesting picture – specific syntactic structures – degree of modification in NPs – types of verbs (e.g. verbs of persuasion, speech verbs, action verbs, descriptive verbs) – distribution of pronouns (1st/2nd/3rd person) – etc … (anything you can think of!) • Quite sophisticated mathematical techniques can give an overall picture – e.g. factor analysis: identifies from a (big) range of variables which ones best identify/characterise differences Corpus Stylistics Methodology (cont’d) Multidimensional analysis • Collect a huge range of measures of a wide variety – some simple word counts – syntactic features – classes and subclasses of N, V, Adj, Avd • Factor analysis – choose a range of features to measure, see which ones are correlated ~150 features in all Corpus Stylistics Methodology (cont’d) • Example: work based on corpora trying to quantify and characterise genre and register differences • Work pioneered by Douglas Biber* • Biber used statistical measures to identify stylistic factors that co-occurred, and could therefore be definitional of text types and genres – E.g. conjuncts like therefore, nevertheless and use of passive together indicate more formal style *D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure and Use, Ch 5: the study of discourse characteristics Corpus Stylistics Methodology (cont’d) • Corpora useful not only for counting frequencies of features, but also: • Concordancing – – – – Lists occurrences of word in context Identify syntactic use of word Identify range of meanings Identify relative frequency of different uses/meanings • Collocation – What words occur together? – Compare distribution of close synonyms Corpus Stylistics Methodology (cont’d) Vocabulary in context • “Concordance”, also known as KWIC list (key word in context) • Allows us to see the (immediate) environment in which a word appears • Listings can be customised to show what you want more clearly, e.g. – sorted according to next or previous word – showing more or less context Corpus Stylistics Methodology (cont’d) Collocation • Term coined by J R Firth (1957) to characterise (part of) his theory of meaning • “You shall judge a word by the company it keeps” • “The occurrence of two or more words within a short space of each other in a text” (Sinclair 1991) • “The relationship a lexical item has with items that appear with greater than random probability in its (textual) context” (Hoey 1991) Style and Corpora Methodology (cont’d) Collocation, text type and style – example: • Distinguish between general and more usual collocations vs. technical and more personal ones • e.g. in a general corpus time collocates with save, spend, waste, fritter away, … • but in a corpus of sports reports time collocates with half, full, extra, injury, first, second, third, … Style and Corpora Applications Stylometry • An attempt to capture the essence of the style of a particular author by reference to a variety of quantitative criteria, usually lexical, called discriminators. • Study of frequently occurring features: word/sentence length; choice and frequency of words; vocabulary richness) • The ideal situation for authorship studies is – when there are large amounts of undisputed text, or – few contenders for the authorship of the disputed text(s). Style and Corpora Applications (cont’d) Author attribution Establishing the author of an unascribed manuscript: • Build corpora – A - works definitely by author A – B - works definitely by author B – C - works of disputed authorship, but probably written by A or B • Then select discriminants and associated measures • When the technique has been shown to discriminate effectively between A and B, then try it on C (M. Oakes: ‘Computational Stylometry’, in Handbook of Corpus Linguistics) Style and Corpora Applications (cont’d) Language Learning • Frequency - in particular, word frequency - had a role in language learning in the days before electronic corpora existed. • The 'corpus revolution' made available frequency information about language use in a totally unprecedented way • Frequency dictionaries and frequency-based grammatical information are becoming more and more available and new sources of frequency information from the Web are being tapped • Various kinds of knowledge found in present-day language textbooks (grammatical, collocational, semantic) are getting to be frequency-based. • In general, corpora represent real usage of language • In addition, "more frequent” can equal “more important“ in many aspects of language learning Style and Corpora Applications (cont’d) Interpretive stylistics • Programmes like WordSmith Tools and other Windows-based applications allow researchers to derive a list of keywords (words which occur significantly more often than expected in texts when compared to a reference corpus). • Keywords are a powerful and quick means of analysis, and they have been used to examine discourses relating to specific social and cultural issues, and the ideology behind authors / texts • See e.g. work by P. Baker on gender and sexual identity Reading Leech, G. Language and Literature: Style and Foregrounding (Longman, 2008), ch.11 Leech, G. and Short, M. Style in Fiction (Routledge, 2007), ch. 2 and 3 Semino, E. & M. Short, Corpus Stylistics: Speech, writing and thought presentation in a corpus of English writing (Routledge, 2004) Short, M. Exploring the language of poems, plays, and prose (Longman, 1996), ch. 11

Corpus Stylistics

Related documents

Products

Support

Corpus Stylistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib