Chapter 3: Methods in corpus linguistics: Interpreting concordance lines (Hunston, Susan. Corpora in Applied Linguistics. CUP, 2002) 3.1 Introduction This chapter offers a number of examples of corpus searches, each one illustrating one or more points about the methodology of finding and interpreting concordance lines, and about the kind of findings that emerge from such study. ‘word-based’ methods cf.) ‘category-based’ methods (chapter 4) The topics in this chapter ・What kind of searches are useful in finding out about how English works? ・How can concordance lines be presented in an accessible way? ・What are concordance lines useful for? ・What view of language does this chapter want to put across? And What general assumptions about language investigation are made in offering concordance lines as a source of information? 3.2 Searches, concordance lines and their presentation A concordancer is a program that searches a corpus for a selected word or phrase and presents every instance of that word or phrase in the centre of the computer screen, with the words that come before and after it to the left and right. a node word = a selected word, appearing in the centre of the screen Sort lines to make them more visible The words focused on are printed bold. e.g.) critical ・ (be, is) critical, (a, his, this) + critical + (noun), self-critical ・ be critical of … ‘negative opinion’ ・ be critical to, be critical in … ‘important’ ・ critical + (noun) … ‘important’ ・ more predicative use than attributive use Search for a phrase, or for specific word-classes e.g.) on + adjective + terms + with a degree of closeness (familiar, friendly, intimate), whether or not the two groups like each other (good, reasonable, bad) a similarity in status (equal) 3.3 What is observable from concordance lines? 3.3.1 Observing the ‘central and typical’ Linguistic description focuses on distinctions between what ‘can’ and ‘cannot’ be said in a particular language, with little regard for whether what is possible frequently or rarely occurs in practice. (Swan) There is no ‘demarcation’ between the correct and the incorrect. In place of demarcation, a corpus offers information that a native speaker cannot replicate: an indication of ‘central and typical’ usage. (Hanks, Sinclair) ’Typical’ is used to describe the most frequent meanings or collocates or phraseology of an individual word or phrase. e.g.) recipe for ・more metaphorical meaning than literal meaning ・recipe for + N which has a negative meaning ・BE + a + recipe for ↓ typical use: something is a recipe for something bad The concept of ‘centrality’ applied to categories of things rather than to individual words. e.g.) central use of present progressive: the reference to present time a central adjective: an adjective which occurs both attributively and predicatively Native speakers’ intuitions about typicality do not always accord with evidence of frequency. Prototypical: a usage which is commonly felt to be typical but which is not necessarily most frequent. (Barlow and Shortall) ・English teaching course books e.g.) comparatives (course book) The USSR is larger than China. (corpus) a larger plan, their larger but poorer northern neighbours comparison is implicit e.g.) reflexive pronouns (course book) be proud of oneself, be proud of one’s child (corpus) FIND + oneself, SEE, IMAGINE, VISUALISE, CONSIDER, ASK + oneself indicating thoughts and speech > the verbs of physical action I saw myself in the mirror. > He hit himself with the hammer. The psychologically prototypical is not necessarily to be ignored in language teaching, but knowing what is central and typical in frequency term can indicate what the bulk of examples that a learner is exposed to should be like. The notion of typicality will be related to the idiom principle and to the reduction of ambiguity. e.g.) Time flies. ‘we perceive time to pass quickly’ > ‘use a stop-watch to time how quickly flies move’ 3.3.2 Observing meaning distinctions Observing typical usages of near-synonyms can clarify differences in meaning. Study on ‘semi-grammatical’ words (words which by themselves carry only a general meaning) (Partignton) e.g.) sheer, pure, complete, utter, absolute sheer: ・with nouns of degree or magnitude, often in the pattern of the sheer N of N(sheer number, the sheer weight of noise) ・in expressions indicating causality (through sheer insistence) absolute: with ‘hyperbolic’ nouns (chaos, disgrace) 3.3.3 Observing meaning and pattern The meaning of words are distinguished by the patterns or phraseologies in which they typically occur. To illustrate this, it is common to divide concordance lines into set, each set exemplifying one meaning. e.g.) initiative Set 1: an initiative, a government initiative ‘something that someone ( usually a government agency or other institution) starts to try to solve a problem’ Set 2: take the initiative ‘start something and so gain an advantage over a competitor’ lose the initiative: ‘fail to start something and so allow a competitor to gain an advantage’ Set 3: ‘a lot of initiative the quality of being able to do things without being told’ e.g.) CONDEMN condemn something , condemn something as something ‘criticize’ condemn someone to something ‘make something bad happen’ ‘pass sentence’ condemn someone to do something (be condemned to do something) ‘something’ is bad or undesirable Words with similar meaning tend to share patterns e.g.) it + BE + adjective + that-clause The adjectives which indicate an evaluation or judgement. Judgements of likelihood (inevitable, likely, possible, unlikely) clarity (apparent, clear, obvious) necessity (crucial, essential, important, necessary) significance (revealing, significant) goodness or badness (appropriate, fitting, etc.) other kinds of judgement ( ironic, surprising, etc.) ‘frames’: sequences of, usually, three words in which the first and last are fixed but the middle word is not e.g.) a …of, be …to The frame has a numerical significance to the corpus which far outweighs the significance of any one of the triplets. Renouf and Sinclair’s corpus: a … of: 3,830 a lot of:1,322 a quarter of: 174 The frame is significant to each word which occurs as the middle item in it. The middle words belong to particular meaning groups. e.g.) many … of thousands, kinds, years, members: numbers, a type or aspect, a length of time, a group of people or thing Why are ‘frames’ useful? ・ Frames are useful because programs can be written to identify them automatically, without the researcher knowing or guessing in advance what they will be. ・ Frames show part of what is typical in a corpus and, because they incorporate variation, they are much more frequent than fixed phrases are. ・ Frames and patterns are an alternative to the very general statements made by most grammars and the very specific statements that can be made about the collocations of each individual word in a language. ・ Frames are particularly useful when looking at a specialized corpus, and can be used as the basis for investigating the language of a discipline. 3.3.4 Observing detail For more specific observations about the behaviour of individual words, it is necessary to obtain more co-text. e.g.) V + advice as to ( + wh-clause) a verb indicating ‘getting, giving, wanting or offering’ e.g.) V + ANSWER as to (+ wh-clause) ・a verb indicating ‘getting, giving, wanting or offering’ ・ANSWER as to follows a phrase indicating a clear answer is not available, or that to give a clear answer is difficult or unexpected ↓ (further research with more co-text) ‘[negative] [clear] answer as to [wh-word] apply to only ANSWER useless as a generalisation 3.4 Coping with a lot of data: using phraseology Sinclair (1999) selecting 30 random lines, and noting the patterns in them, then selecting a different 30, noting the new patterns, then another 30 and so on until further selections of 30 lines no longer yield anything new. ‘hypothesis testing’ A small selection of lines is used as a basis for a set of hypotheses about patterns. Other searches are then employed to test those hypotheses and form new one. This is a way of investigating a very frequent word without being faced by thousands of lines at once. 3.4.1 Suggestion SUGGESTION + finite clause, SUGGESTION as to, SUGGESTION for, SUGGESTION of ↓ SUGGESTION of, SUGGESTION for, SUGGESTION to, (no SUGGESTION as to) ↓ SUGGESTION for, SUGGESTION to, SUGGESTION as to ↓ SUGGESTION for: occurs over 1,000 times in the Bank of English corpus … a significant part of the way SUGGESTION behaves SUGGESTION to: the to-infinitive clause is dependent on a word other than SUGGESTION ↓ the to-infinitive clause explains what the suggestion is the to-infinitive clause means something like ‘in order to’ ↓ SUGGESTION does have a pattern in which it is followed by a to-infinitive, but that pattern is not exemplified by all 200 lines of SUGGESTION to SUGGESTION as to: followed by a clause beginning with a wh-word 3.4.2 Point The search may be based on what comes before point what comes after point a word-class ↓ Each phrase illustrates a different meaning Point indicates the name of a place and a way of scoring in a game. It is used with this or that to refer back to something that has been said before. 3.5 Using a wider context: observing hidden meaning 3.5.1 I must admit A larger context than a concordance line is needed to find out about patterning when subtle meanings and usages are being observed. e.g.) par for the course In general terms: ‘this kind of thing is to be expected’ Channell’s investigation: ・It is used to comment on events which are negatively evaluated, not on events that are desirable. ・In spoken English, the phrase is often used by one speaker apparently to express solidarity with another speaker who is reporting a frustrating, annoying or inconvenient event. e.g.) I must admit (Establishing a pattern from the different instances) ・The speaker uses I must admit when saying something that is uncomplimentary, either about the hearer or about a third party ・What the speaker says or implies contradicts what a previous speaker has said. ・The speaker says something that is potentially embarrassing to him or herself, not to the hearer. ↓ The common theme is the need for a speaker to protect their own face or that of their hearer. (Brown and Levinson) ↓ Genreralisation: I must admit mitigates a threat to face. (Considering a counter-example) A previous speaker has spoken at some length about her dislike of hearing swear-words in public. I must admit I don’t like to hear bad language in the street. The speaker may dislike hearing bad language in the street, but may not object to it in other contexts. The speaker does not actually say this, but maintains his integrity by implying a measure of disparity between their opinions. ↓ The general theory is not necessarily falsified. 3.5.2 SIT through SIT through demonstrates what is sometimes called semantic prosody. SIT through follows have to (= connotation) an expression indicating the pressure has been exerted an expression indicating that someone does not want to do something The experience being undergone is unpleasant in some way ↓ sat through, sitting through ↓ SIT through is followed by an indication of a specific length of time by an indication that the length of time is judged to be uncomfortably long ↓ Use of SIT through implies boredom or discomfort. ↓ Considering a counter-example our home grown industry and having sat through this delightful, acid, piece ↓ ‘Hidden meaning of SIT through can be overridden. SIT through carries connotations of ‘boredom’ amassed through the typical contexts. The connotations hold unless there is very clear evidence to the contrary. Typicality is crucial here, because the connotation cannot be observed by examining one or two instances of the phrase, but depends on its most frequent contexts. 3.6 Using probes Probes: searches to find sets of words or expressions that cannot easily otherwise be called to mind e.g.) how men and women are typically evaluated: something/nothing + adjective + about/in + him/her ・Adjectives focusing on the construal of age and sexuality: boyish, childlike, masculine, womanly ( examples found in the male list only) ・We evaluate people in terms of their moral and psychological characteristics ・We evaluate people surprisingly often in terms of how much like other people they are, with how much confidence we are able to understand them e.g.) would like ・ways of expressing possible or hypothetical events other than with an if-clause or a when-clause: otherwise, in the event of 3.7 Issues in accessing and interpreting concordance lines The techniques needed to access and interpret concordance lines ・the variation using the word, the lemma or the phrase as a target ・the need to edit the lines to separate the target phrase form others ・the need to sort lines to make the patterning in them more visible ・the fact that it is often necessary to look at only part of each line in a set of concordance lines in order to identify patterning. ・the need sometimes to look at more co-text than a short concordance line ・the need to tackle a large amount of data by looking at successive group of a small number of lines, forming and testing hypotheses; ・the need to concentrate on evidence for the ‘central and typical’; ・the need to consider apparent counter-examples; support the hypothesis or not Concordance lines present information; they do not interpret it. Interpretation requires the insight and intuition of the observer. To some extent, this is a disadvantage, as it complicates statistical work on corpora. The enormous benefit, however, is that the human eye can perceive features of language that were hitherto unguessed-at. Corpus interpretation ・ distinguishing between categories such as types of clauses ・ making generalizations about the association between the way a word is used and its meaning ・ making statements about ‘the way the world is’ --- interesting and difficult e.g.) There are roughly twice as many instances of left-handed as right-handed in the Bank of English corpus. Why? ― ・more left-handed people in the world ・the left-handed have a higher status ・right-handedness :‘norm’ and left-handedness:‘deviant’ deviance is more often mentioned than normality Looking at the lines themselves suggests that the third one is the most likely interpretation, but it is important to recognize that this is an interpretation of evidence, not ‘fact’. *** New Terms *** p.39 node word p.42 typical, typicality p.43 central, centrality p.43 prototypical p.45 semi-grammatical words p.49 frames p.52 hypothesis testing p.60 semantic prosody p.62 probes References: http://www.soc.hyogo-u.ac.jp/tani/ch3.html