Using WMatrix for stylistic analysis Aims of this session: To practice using WMatrix for corpus stylistic analysis To introduce the concepts of corpus-based and corpus-driven stylistics To test some hypotheses about Hemingway’s use of language Approaches to Corpus Stylistics: Corpus-assisted stylistics Corpus-based stylistics Corpus-driven stylistics Based on O’ Halloran Based on a distinction by Tognini-Bonelli (2001) Corpus-based linguistics: … corpus evidence is brought in as an extra bonus rather than as a determining factor with respect to the analysis, which is still carried out according to pre-existing categories; although it is used to refine such categories, it is never really in a position to challenge them as there is no claim made that they arise directly from the data. (Tognini-Bonelli 2001: 66) Corpus-driven linguistics: There might be a large number of potentially meaningful patterns that escape the attention of the traditional linguist; these will not be recorded in traditional reference works and may not even be recognised until they are forced upon the corpus analyst by the sheer visual presence of the emerging patterns in a concordance page. (Tognini-Bonelli 2001: 86) 1. Log in to Wmatrix 1. To access the Wmatrix environment you need to type the following URL into your webbrowser: http://ucrel.lancs.ac.uk/wmatrix3.html 2. Login using your user name and password 3. Click on My folders 4. Load For Whom the Bell Tolls using Tag Wizard in the top left-hand corner of the My Folders screen. 2. Making comparisons Wmatrix allows you to compare your text/data with other data in terms of: (i) (ii) (iii) words parts of speech (POS) semantic fields This means that you can compare the word list for your text with the word list of, say, a larger corpus of data. The differences between the relative frequencies of words in the texts are tested for statistical significance using the Log-likelihood (LL) calculation. This results in a list of key words, with the most statistically significantly overused words at the top of the list. Similarly, this process can be done for parts of speech (to give key POS) and semantic groups (to give key semantic domains). We will first make a key parts of speech comparison in order to test the following hypothesis: Subordinating conjunctions are significantly under-represented in Hemingway’s writing. 1 We will compare the words in For Whom the Bell Tolls against the words in the BNC written imaginative sampler corpus (1 million words from BNC written corpus), which is a corpus file already loaded into Wmatrix. 5. Click on the down arrow of the drop-down-menu box in the POS row of the table. This should present you with a list of possible files with which to compare the semantic categories for of the For Whom the Bell Tolls file. The BNC files come as standard with Wmatrix, while the others are the other files loaded into the work area. 6. Select BNC Sampler Written Imag and click on Go (see below). You should now see the following screen The list displayed in the screenshot above has eight columns: (i) (ii) (iii) (iv) (v) (vi) (vii) List link – click on this to see the words that fall in this category Concordance link – click on this to see the words in this category as they occurs in For Whom the Bell Tolls The POS category (Item) The raw total for that item in For Whom the Bell Tolls (text O1) The relative or percentage frequency of that item in For Whom the Bell Tolls (%1) The raw total of the word item in the BNC Sampler Written Imag (text 02) The relative or percentage frequency of that item in BNC Sampler Written Imag (%2); a plus sign denotes that the word item appears more in text 01 than it does in text 02 2 (viii) The log-likelihood (LL) score – a calculation of statistical significance. Note that a loglikelihood score of 15.13 is the cut off for 99.99% confidence of statistical significance Take a moment to look at the words in the list, and practice using the concordance links to look more closely at particular words in context. 7. Now click on Tagsets: POS at the top of the screen and find out which tags indicate subordinating conjunctions: 8. Now look for the subordinating conjunction tags in your list of key parts of speech. Remember: the hypothesis states that subordinating conjunction should be key in Hemingway; that is, they should be over-used when compared against their distribution in the reference corpus. Is this really the case? 3. Corpus-driven stylistics Having addressed the hypothesis, we’ll now take a corpus-driven approach and explore the data from the bottom-up, in order to see what we can discover about Hemingway’s novel. 9. Go back to the main work area screen. Click on the down arrow of the drop-down-menu box in the Semantic row of the table. This should present you with a list of possible files with which to compare the semantic categories for of the For Whom the Bell Tolls file. The BNC files come as standard with Wmatrix, while the others are the other files loaded into the work area. 10. Select BNC Sampler Written Imag and click on Go (see below). You should now see the following screen: 3 The list displayed in the screenshot above has nine columns: (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) List link – click on this to see the words that fall in this category Concordance link – click on this to see the words in this category as they occur in For Whom the Bell Tolls The semantic category (Item) The raw total for that item in For Whom the Bell Tolls (text 01) The relative or percentage frequency of that item in For Whom the Bell Tolls (%1) The raw total of the word item in the BNC Sampler Written Imaginative (text 02) The relative or percentage frequency of that item in BNC Sampler Written Imaginative (%2); a plus sign that denotes that the item appears more in text 01 than it does in text 02 The log-likelihood (LL) score – a calculation of statistical significance. Note that a loglikelihood score of 15.13 is the cut off for 99.99% confidence of statistical significance The name given to the semantic category Take a moment to look at the words in the list, and practice using the concordance links to look more closely at particular words in context. 4. Now answer the following questions: (i) (ii) (iii) (iv) (v) What do the key semantic domains (i.e. those with a log-likelihood score of 15.13 or higher) tell us about the main themes of the story? Are you surprised by any of the key semantic domains in the list? Do the key semantic domains suggest anything particular about Hemingway as a writer? What is the top key part-of-speech in For Whom the Bell Tolls and can you relate this to the themes of the story? Generate a list of keywords. What significance do you think now has as a keyword? 4