Using WMatrix for stylistic analysis Aims of this session: To practice

advertisement
Using WMatrix for stylistic analysis
Aims of this session:
 To practice using WMatrix for corpus stylistic analysis
 To introduce the concepts of corpus-based and corpus-driven stylistics
 To test some hypotheses about Hemingway’s use of language
Approaches to Corpus Stylistics:
 Corpus-assisted stylistics
 Corpus-based stylistics
 Corpus-driven stylistics
Based on O’ Halloran
Based on a distinction by Tognini-Bonelli (2001)
Corpus-based linguistics:
… corpus evidence is brought in as an extra bonus rather than as a determining
factor with respect to the analysis, which is still carried out according to pre-existing
categories; although it is used to refine such categories, it is never really in a position
to challenge them as there is no claim made that they arise directly from the data.
(Tognini-Bonelli 2001: 66)
Corpus-driven linguistics:
There might be a large number of potentially meaningful patterns that escape the
attention of the traditional linguist; these will not be recorded in traditional reference
works and may not even be recognised until they are forced upon the corpus analyst
by the sheer visual presence of the emerging patterns in a concordance page.
(Tognini-Bonelli 2001: 86)
1. Log in to Wmatrix
1. To access the Wmatrix environment you need to type the following URL into your webbrowser: http://ucrel.lancs.ac.uk/wmatrix3.html
2. Login using your user name and password
3. Click on My folders
4. Load For Whom the Bell Tolls using Tag Wizard in the top left-hand corner of the My
Folders screen.
2. Making comparisons
Wmatrix allows you to compare your text/data with other data in terms of:
(i)
(ii)
(iii)
words
parts of speech (POS)
semantic fields
This means that you can compare the word list for your text with the word list of, say, a larger
corpus of data. The differences between the relative frequencies of words in the texts are
tested for statistical significance using the Log-likelihood (LL) calculation. This results in a list
of key words, with the most statistically significantly overused words at the top of the list.
Similarly, this process can be done for parts of speech (to give key POS) and semantic
groups (to give key semantic domains).
We will first make a key parts of speech comparison in order to test the following hypothesis:
Subordinating conjunctions are significantly under-represented in Hemingway’s
writing.
1
We will compare the words in For Whom the Bell Tolls against the words in the BNC written
imaginative sampler corpus (1 million words from BNC written corpus), which is a corpus file
already loaded into Wmatrix.
5. Click on the down arrow of the drop-down-menu box in the POS row of the table. This
should present you with a list of possible files with which to compare the semantic
categories for of the For Whom the Bell Tolls file. The BNC files come as standard with
Wmatrix, while the others are the other files loaded into the work area.
6. Select BNC Sampler Written Imag and click on Go (see below).
You should now see the following screen
The list displayed in the screenshot above has eight columns:
(i)
(ii)
(iii)
(iv)
(v)
(vi)
(vii)
List link – click on this to see the words that fall in this category
Concordance link – click on this to see the words in this category as they occurs in
For Whom the Bell Tolls
The POS category (Item)
The raw total for that item in For Whom the Bell Tolls (text O1)
The relative or percentage frequency of that item in For Whom the Bell Tolls (%1)
The raw total of the word item in the BNC Sampler Written Imag (text 02)
The relative or percentage frequency of that item in BNC Sampler Written Imag (%2);
a plus sign denotes that the word item appears more in text 01 than it does in text 02
2
(viii)
The log-likelihood (LL) score – a calculation of statistical significance. Note that a loglikelihood score of 15.13 is the cut off for 99.99% confidence of statistical significance
Take a moment to look at the words in the list, and practice using the concordance links to
look more closely at particular words in context.
7. Now click on Tagsets: POS at the top of the screen and find out which tags indicate
subordinating conjunctions:
8. Now look for the subordinating conjunction tags in your list of key parts of speech.
Remember: the hypothesis states that subordinating conjunction should be key in
Hemingway; that is, they should be over-used when compared against their distribution in
the reference corpus. Is this really the case?
3. Corpus-driven stylistics
Having addressed the hypothesis, we’ll now take a corpus-driven approach and explore the
data from the bottom-up, in order to see what we can discover about Hemingway’s novel.
9. Go back to the main work area screen. Click on the down arrow of the drop-down-menu
box in the Semantic row of the table. This should present you with a list of possible files
with which to compare the semantic categories for of the For Whom the Bell Tolls file.
The BNC files come as standard with Wmatrix, while the others are the other files loaded
into the work area.
10. Select BNC Sampler Written Imag and click on Go (see below).
You should now see the following screen:
3
The list displayed in the screenshot above has nine columns:
(i)
(ii)
(iii)
(iv)
(v)
(vi)
(vii)
(viii)
(ix)
List link – click on this to see the words that fall in this category
Concordance link – click on this to see the words in this category as they occur in For
Whom the Bell Tolls
The semantic category (Item)
The raw total for that item in For Whom the Bell Tolls (text 01)
The relative or percentage frequency of that item in For Whom the Bell Tolls (%1)
The raw total of the word item in the BNC Sampler Written Imaginative (text 02)
The relative or percentage frequency of that item in BNC Sampler Written
Imaginative (%2); a plus sign that denotes that the item appears more in text 01 than
it does in text 02
The log-likelihood (LL) score – a calculation of statistical significance. Note that a loglikelihood score of 15.13 is the cut off for 99.99% confidence of statistical significance
The name given to the semantic category
Take a moment to look at the words in the list, and practice using the concordance links to
look more closely at particular words in context.
4. Now answer the following questions:
(i)
(ii)
(iii)
(iv)
(v)
What do the key semantic domains (i.e. those with a log-likelihood score of 15.13 or
higher) tell us about the main themes of the story?
Are you surprised by any of the key semantic domains in the list?
Do the key semantic domains suggest anything particular about Hemingway as a
writer?
What is the top key part-of-speech in For Whom the Bell Tolls and can you relate
this to the themes of the story?
Generate a list of keywords. What significance do you think now has as a keyword?
4
Download