Handout

advertisement
Corpus linguistics and stylistics
Stylistics
“Stylistics is the study of literature using methods, theories and concepts from linguistics” (Leech and
Short 2007:1), it is "[...] the study of the relationship between linguistic form and literary function [...]”
(Leech and Short 2007:3)
Style
Style is an idiosyncratic way of performing a particular action. It can be non-linguistic or it can be
linguistic.
“Style is a way in which language is used [...] style consists in choices made from the repertoire of the
language [...] Stylistic choice is limited to those aspects of linguistic choice which concern alternative
ways of rendering the same subject matter” (Leech and Short 2007:31)
“[...] the recognition and analysis of styles are squarely based on comparison.” (Enkvist 1973: 25-26)
“We match the text against another body of texts which we might label as norm, this norm being chosen
because it is contextually relevant as a background for the text. […] Features whose densities are
significantly different in the text and in the norm are style markers for the text in relation to the norm
used. A change of norm may result in a different inventory of style markers.
The norm may be chosen from a wide field. One portion of a text may be matched against other
portions or the whole of the same text. One text may be compared to other texts. Or the text may be set
against an imaginary norm that only exists in a critic’s mind” (Enkvist 1973: 25-26)
Corpus linguistics + stylistics = corpus stylistics

The advent of corpora, and the availability of a range of corpus-based techniques, have opened
up new avenues in the study of literature, and prose fiction in particular.

The ‘corpus turn’ (Leech and Short 2007:284) refers to the on-going trend in stylistics to use
methods and tools from corpus-linguistics for the analysis of literary and other texts.

Usually referred to as corpus-stylistics
Intra-textual vs. inter-textual approaches to electronic text analysis (Adolphs 2006):
Intra-textual: analysis of an individual text (e.g. via concordances, collocations, etc.)
Inter-textual: comparison of an individual text with other related texts or with a larger corpus
1
Corpus approaches and genre style

Biber (1988) – multivariate statistical techniques
factor analysis; many different variables; variables = linguistic features (e.g. passive
constructions)

e.g. Narrative versus non-narrative texts
important variables = past tense verbs, 3rd person pronouns, perfect aspect, present participle
clauses; High scores = narrative; Low scores = non-narrative
Corpus approaches and authorial style

Studies attempting to ‘fingerprint’ authors: i.e. to identify linguistic items that distinguish the
works by one author from those of others.

Burrows (1987) study of Jane Austen’s novels focusing on closed-class words, such as the, and,
of, a and to.

Burrows found that these words can distinguish the works of different authors, different novels,
and even the words spoken by different characters.

Hoover (2002) studied a series of corpora containing chunks from novels by different authors.

The distribution of the 300 most frequent words in the corpus correctly clusters 15 out of 17
novels.
Corpus approaches and text style

Stubbs’s (2005) study of Joseph Conrad’s Heart of Darkness.

Comparison between word frequencies in Heart of Darkness and in the ‘Imaginative Writing’
section of the BNC

Connection between key-words and Enkvist’s ‘style markers’

Stubbs noticed words expressing vagueness and uncertainty were more frequent in Heart of
Darkness: ‘as though’, ‘seemed’, ‘kind of’, ‘sort of’.

Stubbs shows how the application of corpus methods can provide further justification for wellestablished interpretations, and new insights into the language and meaning potential of the
text.
Corpus approaches and variation inside texts

Culpeper (2002) used WordSmith Tools to do a key-word analysis of the speech of the main
characters in Romeo and Juliet
2

A file with the words spoken by each character was compared to a ‘reference corpus’ containing
the words of all the other characters.

Findings are relevant to an understanding of how the characters are linguistically constructed
(characterisation).

Culpeper, J. (2009) Revisited analysis, using WMatrix
Corpus tools
WordSmith Tools (Scott 2007)
http://www.lexically.net/wordsmith/
AntConc (Anthony 2010)
http://www.antlab.sci.waseda.ac.jp/software.html
WMatrix (Rayson 2003, 2008)
http://ucrel.lancs.ac.uk/wmatrix/
Multilingual Corpus Toolkit (MLCT)
https://sites.google.com/site/scottpiaosite/software/mlct
(Piao)
WMatrix–
web-based environment – processes texts in three ways:
1)
Word level – simple word frequency lists.
2)
Grammatical level – CLAWS 1(Constituent Likelihood Automatic Word-tagging System - see
Garside 1987; Leech, Garside and Bryant 1994; Garside 1996; and Garside and Smith 1997)
adds parts of speech (POS) tags (with 96-97% accuracy) to the words in the text.
3)
Semantic level – The USAS (UCREL2 Semantic Analysis System) tagger assigned tags to the
words (with 92% accuracy) from a set of predefined semantic fields. Currently there are 21
major fields (see Table 1), which are subdivided into 232 category labels to allow for more finegrained classification.
(More info about semantic categories at http://ucrel.lancs.ac.uk/usas/)
Keyness – A concept made popular by Mike Scott and Wordsmith Tools (Scott 1996, 1999, 2007)
WordSmith, AntConc and WMatrix all have a ‘Keyword’ facility that compares the word-list from one
text with the word-list from another text, or reference corpus, to produce a list of words that are
statistically ‘over-used’ (or ‘under-used’, depending on the settings selected) in the first text.
WMatrix extends the notion of keyness as it can not only compare texts at the word level (key words),
but also at the grammatical level (key POS), and the semantic level (key concepts).
WMatrix uses Log-likelihood 3 to calculate keyness.
1
See http://ucrel.lancs.ac.uk/claws/ for further information.
2
University Centre for Computer Corpus Research on Language
3
log-likelihood info and wizard: http://ucrel.lancs.ac.uk/llwizard.html
3
Conclusions

The notion of ‘style’ is fundamentally based on comparison

Corpus-based methods are relevant to the analysis of style in fiction/literature, and can also
play a role in explaining and refining interpretations of texts.

A variety of corpus methods have been applied to the analysis of genres, authors and texts.

Corpus tools can be used for the analysis of the ‘content’ or themes of texts, and for the analysis
of some aspects of ‘style’.

However, manual analysis and interpretation of the output of the tool are needed.
[...] ‘corpus stylistics’ is not purely a quantitative study of literature. Rather, it is still a qualitative
stylistic approach to the study of language of literature, combined with or supported by corpusbased quantitative methods and technology. Ho (2011:10)
References
Adolph, S. (2006) Introducing Electronic Text Analysis. London: Routledge.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Burrows, J. (1987) Computation into Criticism: A Study of Jane Austen's Novels, and an Experiment in
Method, Oxfrod: Oxford University Press.
Culpeper, J. (2002) ‘Computers, language and characterisation: An Analysis of six characters in Romeo
and Juliet’. In: U. Melander-Marttala, C. Ostman and Merja Kyto (eds.), Conversation in Life and in
Literature: Papers from the ASLA Symposium, Association Suedoise de Linguistique Appliquee (ASLA), 15.
Universitetstryckeriet: Uppsala, pp.11-30.
Enkvist, N. E. (1973) Linguistic Stylistics. The Hague: Mouton.
Ho, Y. (2011) Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles’ The
Magus. London: Continuum
Hoover, D. L. (2002) ‘Frequent word sequences and statistical analysis’. Literary and Linguistic
Computing, 17, 2, 157-80.
Leech, G. N. and Short, M. (2007) Style in Fiction (2nd ed). London: Longman.
Stubbs, M. (2005) ‘Conrad in the computer: examples of quantitative stylistic methods’. Language and
Literature, 14, 1, 5-24.
4
Download