LIN 3098 Corpus Linguistics Practical Task IV 1 Introduction The aim of today’s practical session is to introduce you to a suite of programs for corpus analysis called WordSmith Tools. These are available in the labs, and we will be using them in future practicals. You will need: o to open the WordSmith program (find it under Start>Programs) o to use the British National Corpus, a local version of which can be found here: \\10.254.64.9\iol-shared\bnc1.0\corpus o Note: you need to type the above into Windows Explorer o You may be prompted for a username and password to access the folder. The user/pass you need are the ones you use for accessing your ITS webmail. However, you need to type CSC\ before your username. For example: Username: CSC\agat1 Password: mypassword Another aim of the practical is to get you exploit some features of the annotation of a corpus. 2 The data and the program o The BNC is structured into multiple subdirectories (called A\ through K\, except I), each containing the corpus files. These are annotated in SGML. We will only use the A directory today. o WordSmith is actually composed of several programs. The three main ones are Concord, Wordlist and Keywords. When you open it, you will see the Main Controller Window, which looks like this: 3 Making a KWIC concordance in WordSmith 3.1 Meaning in context There is a tradition in semantics that was promoted by the British Linguist J.R. Firth, who emphasised that meaning of words is dependent on context. He is especially known for having said: You shall know a word by the company it keeps. In this practical, we shall try to identify different meanings of a word, based on the company that the word keeps, that is, the different patterns of usage of the word in context. Our focus will be on a single word: deal, both as a noun and as a verb. You can probably think of several possible meanings of deal. Make a rough estimate: 1. There are roughly _________ different senses of deal as Verb. 2. There are roughly _________ different senses of deal as a Noun. 3.2 Starting the WordSmith concordancer We’ll explore the different senses of deal through a keyword in context (KWIC) concordance. 1. In the WordSmith main controller, click Concord. 2. In Concord, choose File> New, as shown below 3. You first need to choose the texts. You’ll see the window below. Click on Choose Texts Now 4. Use the file browser to navigate to the BNC directory. From there, drag the subdirectory called A\ into the right window. 3.3 Making a simple concordance To make a simple concordance, enter the search word in the search window, as shown below. (For example, type deal). Answer the following questions: 1. When you search for the word deal, do you see all inflectional forms of the word? 2. Now try to use the * symbol (called a wildcard), typing deal*. Do your results look any different? How? What do you think the wildcard means? 3.4 Making a concordance based on tags Rather than searching for simple words, we can exploit the fact that the BNC has morphosyntactic annotation. Before continuing, we need to let WordSmith know that our corpus is tagged. Fortunately, it comes with presets for the BNC. You will need to take the following steps: 1. In the main controller window, go to Settings>Adjust settings 2. In the new window, click on the Tags tab, as shown below. Then, from the Custom Settings menu, choose British National Corpus (First Edition). NB: Do not select “World Edition”. 3.4.1 Sampling You’re very likely to find a huge number of occurrences when doing a concordance. Too much data will hinder your analysis. You can instruct WordSmith to take a random sample, as follows: 1. Go to the settings window as before. 2. This time, select the Concord tab (which specifies settings for the concordance program.) 3. Check the At random box. 4. In the form below, use 1 in 10 5. Check the Auto-remove duplicates box. 3.4.2 Extracting the data (You might find it better to work with a partner in this part of the practical). While doing this exercise, use the sheet in the Appendix to make a note of the various senses you think you can find of the word deal. Remember, we are interested in: o deal as a noun, where the morphosyntactic tag would start with NN o deal as a verb where the morphosyntactic tag would start with V. Remember, the BNC is annotated in SGML. The format of SGML tags is as follows: o a tag is delimited in angle brackets, for example, <w> indicates a word token o end tags in SGML (as opposed to XML) can be omitted; in fact, words in the BNC have only start tags: <w>word o attributes in the <w> tags in the BNC indicate the morphosyntactic category; thus: <w NN1>man indicates a singular noun. Do the following: 1. To search for deal as a noun, conduct a search for singular (NN1) or plural (NN2) forms. You can do this using the slash (/) symbol, which stands for “or”: <w NN1>deal/<w NN2>deal*. (Note: we use * in the second case because we’re interested in finding deals etc). 2. Similarly, conduct a search for deal as a verb. Verb tags start with V, but there are several of these. Try using a wildcard for the tag. 3.4.2 Carrying out your analysis Your task now is to analyse these multiple occurrences of deal. Find sets of occurrences of the word that seem to share a single, core “meaning” or “sense”. Here are some questions you might want to ask: 1. Does a particular usage of deal occur as part of a larger, somewhat idiomatic phrase (e.g. a great deal of wine)? a. How would you gloss the meaning of this usage? 2. If deal is a verb, what sorts of arguments does it take? a. Does it have an NP subject? b. Does it have a prepositional phrase object (deal with X etc)? c. Do all verb uses mean exactly the same thing, or are they subtly different? 3. If deal is a noun, what is it an argument of? a. Who makes a deal? People? Companies? b. Can it be modified or quantified? (e.g. a compromise deal, a 1 milliondollar deal) c. Are there different sorts of deal that one can make? APPENDIX: Uses and meanings of deal Group 1 2 3 4 5 6 Category Representative example Gloss