a corpus-based approach to modern fiction

Ways of searching for the Zeitgeist of Modernity
- a corpus-based approach to modern fiction
Ilina Doykova
Shumen University, Shumen (Bulgaria)
[email protected]
Statistical analysis
Simple things may characterise different styles
More complex analyses give a more interesting picture
average sentence length
average word length
vocabulary richness
vocabulary growth (homogeneity of text)
specific syntactic structures
degree of modification in NPs
types of verbs (e.g. verbs of persuasion, speech verbs, action verbs, descriptive verbs)
distribution of pronouns (1st/2nd/3rd person)
themes, beliefs, etc.
Especially when used comparatively
Linguistic Tools: WordSmith and Wmatrix
Useful features:
+ Tagging
+ WordList
+ Concordance
= identifies and labels PoS
= generates word-frequency lists
= lists occurrences of a word in context and its
immediate environment, gives access to
• Identify syntactic use of word
• Identify range of meanings
• Identify relative frequency of different uses/meanings
+ KWIC (key word)
+ Word Clouds
= identification of key words through a
comparison with a reference corpus
= semantic tagsets in 21 domains
• Listings can be customised to show what you want more clearly:
sort according to next or previous word
show more or less context
highlight important information
Word Frequency List (Wmatrix)
WordSmith frequency list of predicative adjectives, Modern British Women
Fiction Writers Corpus
Key words list and dispersion plot
(ALONE in MBWFW corpus)
Consistency analysis indicates whether a word is found consistently across
lots of different texts or only in a narrow set of texts, or a specific text
Lemmatized results for relational pairs
WordSmith and Wmatrix
Investigation of semantic domains through semantic tagging (Wmatrix)
Key Domain clouds (for Wmatrix only)
• The larger the word, the greater its “keyness” or uniqueness
as compared to the BNC Written Sampler of imaginative texts.
Comparison of linguistic software
Research and language learning
Word frequency
knowledge in present-day language textbooks (grammatical,
collocational, semantic) is frequency-based;
Real usage
corpora represent actual, not prescribed usage;
find the best equivalent;
investigate on word classes, specific syntactic structures;
Teaching collocations
‘trouble and strife’, ‘the elephant in the room’; ‘blue murder’
specific content (sexist, racist or ideological, etc. )
identification of true authorship
Analysis of texts written in any language and any alphabet
Biber, Douglas et al. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP, 1998.
Campbell, R.S., & Pennebaker, J.W. (2003).The secret life of pronouns: Flexibility in writing style and physical health. Psychological
Science, 14, 60-65, 2003.
Leech, G. N. and Scott M. (1981). Style in Fiction. London: Longman, 1981.
Rayson, Paul. (2009). Wmatrix. A Web-based Corpus Processing Environment, Computing Department, Lancaster University, 2009.
Rayson, P., Archer, D., Piao, S. L., McEnery (2004). UCREL Semantic Analysis System (USAS), 2004. (http://ucrel.lancs.ac.uk/usas/)
Scott, M. (2012). WordSmith Tools, Version 6, Liverpool: Lexical Analysis Software, 2012
Seizova-Nankova,T. (2012). Primary school education and computer-based language study, BETA Papers, 2012.
Seizova-Nankova,T. (in print). Developing collocational competence. A case study. 12th International language, Literature and Stylistics
Symposium, Edirne, Trakya University, Turkey.
Semino, E. and Scott, M. (2004). Corpus Stylistics: Speech, writing and thought presentation in a corpus of English writing, Routledge,
[10] Sinclair, J. (2007). The Search for Units of Meaning. In Corpus Linguistics: Critical Concepts in Linguistics. Vol. 3. Routledge, 2007.
[11] Yasunori Nishina. (2007). A Corpus-Driven Approach to Genre Analysis: The Reinvestigation of Academic, Newspaper and Literary
Texts”, ELR Journal, 1 (2), 2007, (http://ejournals.org.uk/ELR/article/2007/2 (accessed 27 June 2013)).
[12] UCREL Home Page, Lancaster, UK. 1993-2013. 23 April, 2013, (http://www.comp.lancs.ac.uk/research/)
Electronic text resources