1 Introduction

advertisement
LIN 3098 Corpus Linguistics
Practical Task IV
1 Introduction
The aim of today’s practical session is to introduce you to a suite of programs for
corpus analysis called WordSmith Tools. These are available in the labs, and we will
be using them in future practicals.
You will need:
o to open the WordSmith program (find it under Start>Programs)
o to use the British National Corpus, a local version of which can be found
here: \\10.254.64.9\iol-shared\bnc1.0\corpus
o Note: you need to type the above into Windows Explorer
o You may be prompted for a username and password to access the
folder. The user/pass you need are the ones you use for accessing
your ITS webmail. However, you need to type CSC\ before your
username. For example:
 Username: CSC\agat1
 Password: mypassword
Another aim of the practical is to get you exploit some features of the annotation of a
corpus.
2 The data and the program
o The BNC is structured into multiple subdirectories (called A\ through K\,
except I), each containing the corpus files. These are annotated in SGML. We
will only use the A directory today.
o WordSmith is actually composed of several programs. The three main ones are
Concord, Wordlist and Keywords. When you open it, you will see the Main
Controller Window, which looks like this:
3 Making a KWIC concordance in WordSmith
3.1 Meaning in context
There is a tradition in semantics that was promoted by the British Linguist J.R. Firth,
who emphasised that meaning of words is dependent on context. He is especially
known for having said:
You shall know a word by the company it keeps.
In this practical, we shall try to identify different meanings of a word, based on the
company that the word keeps, that is, the different patterns of usage of the word in
context. Our focus will be on a single word: deal, both as a noun and as a verb. You
can probably think of several possible meanings of deal. Make a rough estimate:
1. There are roughly _________ different senses of deal as Verb.
2. There are roughly _________ different senses of deal as a Noun.
3.2 Starting the WordSmith concordancer
We’ll explore the different senses of deal through a keyword in context (KWIC)
concordance.
1. In the WordSmith main controller, click Concord.
2. In Concord, choose File> New, as shown below
3. You first need to choose the texts. You’ll see the window below. Click on
Choose Texts Now
4. Use the file browser to navigate to the BNC directory. From there, drag the
subdirectory called A\ into the right window.
3.3 Making a simple concordance
To make a simple concordance, enter the search word in the search window, as shown
below. (For example, type deal).
Answer the following questions:
1. When you search for the word deal, do you see all inflectional forms of the
word?
2. Now try to use the * symbol (called a wildcard), typing deal*. Do your results
look any different? How? What do you think the wildcard means?
3.4 Making a concordance based on tags
Rather than searching for simple words, we can exploit the fact that the BNC has
morphosyntactic annotation. Before continuing, we need to let WordSmith know that
our corpus is tagged. Fortunately, it comes with presets for the BNC.
You will need to take the following steps:
1. In the main controller window, go to Settings>Adjust settings
2. In the new window, click on the Tags tab, as shown below. Then, from the
Custom Settings menu, choose British National Corpus (First Edition). NB:
Do not select “World Edition”.
3.4.1 Sampling
You’re very likely to find a huge number of occurrences when doing a concordance.
Too much data will hinder your analysis. You can instruct WordSmith to take a
random sample, as follows:
1. Go to the settings window as before.
2. This time, select the Concord tab (which specifies settings for the concordance
program.)
3. Check the At random box.
4. In the form below, use 1 in 10
5. Check the Auto-remove duplicates box.
3.4.2 Extracting the data
(You might find it better to work with a partner in this part of the practical).
While doing this exercise, use the sheet in the Appendix to make a note of the various
senses you think you can find of the word deal. Remember, we are interested in:
o deal as a noun, where the morphosyntactic tag would start with NN
o deal as a verb where the morphosyntactic tag would start with V.
Remember, the BNC is annotated in SGML. The format of SGML tags is as follows:
o a tag is delimited in angle brackets, for example, <w> indicates a word token
o end tags in SGML (as opposed to XML) can be omitted; in fact, words in the
BNC have only start tags: <w>word
o attributes in the <w> tags in the BNC indicate the morphosyntactic category;
thus: <w NN1>man indicates a singular noun.
Do the following:
1. To search for deal as a noun, conduct a search for singular (NN1) or plural
(NN2) forms. You can do this using the slash (/) symbol, which stands for
“or”:
<w NN1>deal/<w NN2>deal*.
(Note: we use * in the second case because we’re interested in finding deals
etc).
2. Similarly, conduct a search for deal as a verb. Verb tags start with V, but there
are several of these. Try using a wildcard for the tag.
3.4.2 Carrying out your analysis
Your task now is to analyse these multiple occurrences of deal. Find sets of
occurrences of the word that seem to share a single, core “meaning” or “sense”. Here
are some questions you might want to ask:
1. Does a particular usage of deal occur as part of a larger, somewhat idiomatic
phrase (e.g. a great deal of wine)?
a. How would you gloss the meaning of this usage?
2. If deal is a verb, what sorts of arguments does it take?
a. Does it have an NP subject?
b. Does it have a prepositional phrase object (deal with X etc)?
c. Do all verb uses mean exactly the same thing, or are they subtly
different?
3. If deal is a noun, what is it an argument of?
a. Who makes a deal? People? Companies?
b. Can it be modified or quantified? (e.g. a compromise deal, a 1 milliondollar deal)
c. Are there different sorts of deal that one can make?
APPENDIX: Uses and meanings of deal
Group
1
2
3
4
5
6
Category
Representative example
Gloss
Download