Ways of searching for the Zeitgeist of Modernity - a corpus-based approach to modern fiction Ilina Doykova Shumen University, Shumen (Bulgaria) ilina.doykova@abv.bg Statistical analysis • Simple things may characterise different styles – – – – • More complex analyses give a more interesting picture – – – – – – • average sentence length average word length vocabulary richness vocabulary growth (homogeneity of text) specific syntactic structures degree of modification in NPs types of verbs (e.g. verbs of persuasion, speech verbs, action verbs, descriptive verbs) distribution of pronouns (1st/2nd/3rd person) themes, beliefs, etc. authorship Especially when used comparatively Linguistic Tools: WordSmith and Wmatrix Useful features: + Tagging + WordList + Concordance = identifies and labels PoS = generates word-frequency lists = lists occurrences of a word in context and its immediate environment, gives access to collocates • Identify syntactic use of word • Identify range of meanings • Identify relative frequency of different uses/meanings + KWIC (key word) + Word Clouds = identification of key words through a comparison with a reference corpus = semantic tagsets in 21 domains • Listings can be customised to show what you want more clearly: sort according to next or previous word show more or less context highlight important information Methodology Word Frequency List (Wmatrix) WordSmith frequency list of predicative adjectives, Modern British Women Fiction Writers Corpus Key words list and dispersion plot (ALONE in MBWFW corpus) Consistency analysis indicates whether a word is found consistently across lots of different texts or only in a narrow set of texts, or a specific text Lemmatized results for relational pairs WordSmith and Wmatrix Investigation of semantic domains through semantic tagging (Wmatrix) Key Domain clouds (for Wmatrix only) • The larger the word, the greater its “keyness” or uniqueness as compared to the BNC Written Sampler of imaginative texts. Comparison of linguistic software Research and language learning Word frequency knowledge in present-day language textbooks (grammatical, collocational, semantic) is frequency-based; Real usage corpora represent actual, not prescribed usage; Translation find the best equivalent; Grammar investigate on word classes, specific syntactic structures; Teaching collocations ‘trouble and strife’, ‘the elephant in the room’; ‘blue murder’ Decoding specific content (sexist, racist or ideological, etc. ) Authorship identification of true authorship Analysis of texts written in any language and any alphabet References [1] [2] Biber, Douglas et al. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP, 1998. Campbell, R.S., & Pennebaker, J.W. (2003).The secret life of pronouns: Flexibility in writing style and physical health. Psychological Science, 14, 60-65, 2003. [3] Leech, G. N. and Scott M. (1981). Style in Fiction. London: Longman, 1981. [4] Rayson, Paul. (2009). Wmatrix. A Web-based Corpus Processing Environment, Computing Department, Lancaster University, 2009. [5] Rayson, P., Archer, D., Piao, S. L., McEnery (2004). UCREL Semantic Analysis System (USAS), 2004. (http://ucrel.lancs.ac.uk/usas/) [6] Scott, M. (2012). WordSmith Tools, Version 6, Liverpool: Lexical Analysis Software, 2012 (http://www.lexically.net/wordsmith/index.html). [7] Seizova-Nankova,T. (2012). Primary school education and computer-based language study, BETA Papers, 2012. [8] Seizova-Nankova,T. (in print). Developing collocational competence. A case study. 12th International language, Literature and Stylistics Symposium, Edirne, Trakya University, Turkey. [9] Semino, E. and Scott, M. (2004). Corpus Stylistics: Speech, writing and thought presentation in a corpus of English writing, Routledge, 2004. [10] Sinclair, J. (2007). The Search for Units of Meaning. In Corpus Linguistics: Critical Concepts in Linguistics. Vol. 3. Routledge, 2007. [11] Yasunori Nishina. (2007). A Corpus-Driven Approach to Genre Analysis: The Reinvestigation of Academic, Newspaper and Literary Texts”, ELR Journal, 1 (2), 2007, (http://ejournals.org.uk/ELR/article/2007/2 (accessed 27 June 2013)). [12] UCREL Home Page, Lancaster, UK. 1993-2013. 23 April, 2013, (http://www.comp.lancs.ac.uk/research/) Electronic text resources • http://www.stylist.co.uk/books, • http://www.newyorker.com, • http://narrativemagazine.com, • http://www.one-story.com, • http://www.teachingenglish.org.uk/teaching-resources, • http://www.guardian.co.uk/books, • http://gutenberg.net.au/