Tackling meaning and aboutness with KeyWords Corpus Linguistics Summer Institute Liverpool 2 July 2008 Mike Scott, School of English University of Liverpool Keyness 1 Purpose To explore the notion of keyness and its implications in corpus-based study with reference to WordSmith Keyness 2 Keyness Words are not key in a language but in a given text Words can be key to a culture (Stubbs 2002, Williams 1976) Keyness: Importance “Aboutness” (Phillips, 1989) Keyness 3 The Notion of Keyness 2 main qualities: Importance “a key player”, “a key position” the keystone of an arch Aboutness (Phillips, 1989) “a key point” = a main point in the text’s development and argument, what the text is “about” Keyness 4 Overview Keyness, as a new territory, looks promising and has attracted colonists and prospectors. It generally appears to give robust indications of the text’s aboutness together with indicators of style. Keyness 5 the text’s aboutness Keyness 6 colonists … Keyness 7 and prospectors Keyness 8 Issues the issue of text section v. text v. corpus v. sub-corpus statistical questions: what exactly can be claimed? how to choose a reference corpus handling related forms such as antonyms Keyness 9 Of course it doesn’t actually understand… Keyness 10 … or know what is “correct” Keyness 11 … only look at what is found in text … or context … whether marked up or not … <intro>Once upon a time ….</intro> Keyness 12 Context? Keyness 13 L e v e ls o f C o n te x t P h y s ic a l e n v iro n m e nt Keyness 14 Corresponding units of meaning morpheme word cluster / phrase sentence paragraph section, chapter text (sub-) genre Keyness 15 If all this is so … what is the status of the “key words” one may identify and what is to be done with them? Keyness 16 Issues 1. 2. 3. 4. 5. the issue of text section v. text v. corpus v. sub-corpus statistical questions: what exactly can be claimed? how to choose a reference corpus handling related forms such as antonyms what is the status of the “key words” one may identify and what is to be done with them? Keyness 17 text section v. text v. corpus v. sub-corpus text section: levels 1-5 text: level 6 corpus: levels 7 & 8 Keyness 18 But these are often not clearly differentiated “text”, level 6: with or without mark-up, images, sounds? what do we mean by section, chapter (4) and other non linguistically defined categories? is text itself mutating? Keyness 19 Internet text Keyness 20 Wikipedia homepage (part) Keyness 21 Wikipedia homepage (part) Keyness 22 Wikipedia article (3 parts of same article) Keyness 23 Wikipedia discussion from History of the stall article latest contributor, “Talk” section Keyness 24 Statistics there is no statistical defence of the whole set of KWs but only of each one comparing KW p values is not advisable Keyness 25 Why? Matrix text, describing a series of troubles affecting a set of crops in a certain place. weevils and chickpeas will be much rarer words (if not rarer entities in this particular place) and will float to the top of the KW list hail wind weevils peas chickpeas potatoes Keyness 26 choosing a reference corpus using a mixed bag RC, the larger the RC the better but a moderate sized RC may suffice. the keyword procedure is fairly robust. KWs identified even by an obviously absurd RC can be plausible indicators of aboutness, which reinforces the conclusion that keyword analysis is robust. genre-specific RCs identify rather different KWs the aboutness of a text may not be one thing but numerous different ones. Scott (forthcoming) Keyness 27 related forms WordSmith can be asked to treat members of the same lemma as related table tables and can handle clusters at the end of but otherwise ignores relations such as synonymy antonymy collocation Keyness 28 status of the KW not intrinsic to the word/cluster but context-bound a pointer to specific textual aboutness and/or style statistically arrived at but not established sometimes pointing to a pattern Keyness 29 status of the set of KWs indicative of the more general aboutness of the source text(s) and/or style but (as a set) not statistically proven Keyness 30 Shakespeare’s KWs Keyness 31 KWs of Hamlet Characters: FORTINBRAS, GERTRUDE, GUILDENSTERN, HAMLET, HAMLET'S,HORATIO, LAERTES, OPHELIA, PYRRHUS, ROSENCRANTZ Places: DENMARK, NORWAY Pronouns: I, IT, T, THEE, THOU Themes, events: MADNESS, PLAY,PLAYERS Other (“unexpected”): E'EN, LORD, MOST, MOTHER, PHRASE, VERY Keyness 32 Most of these are obvious & probably uninteresting…. if you know the play you already know it concerns Hamlet and some other characters it’s set in Denmark Ophelia goes mad. Keyness 33 … but some are puzzling Why are IT, LORD and MOST positively key in Hamlet… if they are negatively key in the other plays? Which characters are they most key of? Where are they found, how are these KWs dispersed throughout the play? Keyness 34 IT in Hamlet (1) In the plays 0.95% (1 word in 100) but in Hamlet’s speeches 1.48%: a 50% increase in this one character’s speeches… in Horatio’s speeches 2.33%: nearly 250% of the average in this one character’s speeches. Keyness 35 IT in Hamlet (2) In Hamlet’s speeches, distributed evenly: per 1,000 1 Plot 173 14.67 In Horatio’s speeches: per 1,000 1 Plot 23.74 Keyness 36 DO in Othello Nearly twice as frequent as in the other plays Characteristic of Iago (nearly twice as often) and Desdemona (more than 3 times as often) DOST characteristic of Othello (more than 6 times as frequent) Keyness 37 Iago: commanding Concordance 1 2 3 <IAGO> Do thou meet me presently at the knows you not. I'll not be far from you: do you find some occasion to anger time, man. I'll tell you what you shall do. Our general's wife is now the general: 4 vow I here engage my words. <IAGO> Do not rise yet. Witness, you ever-burni 5 out to savage madness. Look! he stirs; Do you withdraw yourself a little while, He 6 speak with me; The which he promis'd. Do but encave yourself, And mark the 7 8 9 10 mind again. This night, Iago. <IAGO> Do it not with poison, strangle her in her him so That I may save my speech. Do but go after And mark how he I am none such. <IAGO> Do not weep, do not weep. Alas the day! <EMILIA> Has I am sure I am none such. <IAGO> Do not weep, do not weep. Alas the day! Keyness 38 Desdemona: conditional Concordance 11 warrant of thy place. Assure thee, If I do vow a friendship, I'll perform it To the 12 go seek him. Cassio, walk hereabout; If I do find him fit, I'll move your suit And seek 13 tears, my lord? If haply you my father do suspect An instrument of this your 14 and ever did, And ever will, though he do shake me off To beggarly divorcement, 15 16 Good faith! how foolish are our minds! If I do die before thee, prithee, shroud me In tell me, Emilia, That there be women do abuse their husbands In such gross Keyness 39 Othello’s DOST: questioning – suspicion Concordance 1 Ha! I like not that. <OTHELLO> What dost thou say? <IAGO> Nothing, my lord: 2 I love you. <OTHELLO> I think thou dost; And, for I know thou art full of love 3 thy brain Some horrible conceit. If thou dost love me, Show me thy thought. 4 for aught I know. <OTHELLO> What dost thou think? <IAGO> Think, my lord! 5 My noble lord,— <OTHELLO> What dost thou say, Iago? <IAGO> Did Michael 6 He did, from first to last: why dost thou ask? <IAGO> But for a 7 thought Too hideous to be shown. Thou dost mean something: I heard thee say 8 meditations lawful? <OTHELLO> Thou dost conspire against thy friend, Iago, If 9 to me as to thy thinkings, As thou dost ruminate, and give thy worst of 10 know my thoughts. <OTHELLO> What dost thou mean? <IAGO> Good name in 11 but keep 't unknown. <OTHELLO> Dost thou say so? <IAGO> She did 12 13 14 Farewell, farewell: If more thou dost perceive, let me know more; Set on My noble lord,— <OTHELLO> If thou dost slander her and torture me, Never you not hurt your head? <OTHELLO> Dost thou mock me? <IAGO> I mock 15 most cunning in my patience; But—dost thou hear?—most bloody. <IAGO> 16 And nothing of a man. <OTHELLO> Dost thou hear, Iago? I will be found most 17 t on the tree. O balmy breath, that dost almost persuade Justice to break her 18 in 's hand. O perjur'd woman! thou dost stone my heart, And mak'st me call Keyness 40 Keyword Clusters Text-initial sections of “Hard News” (Guardian 1998-2004) studying Hoey’s Lexical Priming theory Keyness 41 Research Questions Using the hard news corpus, 1. How many 3-5 word clusters are found to be key in TISC sections? 2. How many are positively and how many are negatively key? 3. What recurrent patterns can be found in the two types of key cluster? Keyness 42 RQs 1 & 2: Numbers of KW clusters using a p value of 0.0000001 and minimum frequency of 3 and log likelihood statistic, 8,132 key clusters altogether (in 3.2 million words of text) of which 7,631 were positively key and 501 negatively key though there is repetition as these are 3-5 word n-grams Research Question 2 Keyness 43 RQ 1: Numbers of KW clusters Is 8 thousand a large number of distinct key text-initial clusters? In the same amount of text there are 84 thousand 3-5 word clusters of frequency at least 5 altogether… about one in 10 is associated with text initial position at the .0000001 level of significance Keyness 44 RQ 1, continued … is 1 in 10 a large number to be key? In the case of SISC (sentences from paragraphs with only one sentence in), we get 507 thousand clusters, of which 2,192 are key (1,747 positively and 445 negatively) which is about 1 in 230 Keyness 45 IT + reporting verb – positively key IT WAS ANNOUNCED LAST NIGHT IT WAS CLAIMED LAST NIGHT IT WAS CONFIRMED LAST NIGHT IT IS REVEALED TODAY Keyness 46 IT otherwise negatively key: IT IS A IT IS ABOUT IT IS EXPECTED IT IS GOING IT IS ONLY IT IS POSSIBLE IT SEEMS TO Keyness 47 Conclusions keyness is a pointer to importance which can be sub-textual textual intertextual Keyness 48 References Berber Sardinha, Tony, 1999. Using Key Words in Text Analysis: practical aspects. DIRECT Papers 42, LAEL, Catholic University of São Paulo. Berber Sardinha, Tony, 2004. Lingüística de Corpus. Barueri: Manole. Culpeper, J. ,2002. 'Computers, language and characterisation: An Analysis of six characters in Romeo and Juliet'. In: U. MelanderMarttala, C. Östman and M. Kytö (eds.), Conversation in Life and in Literature: Papers from the ASLA Symposium, Association Suedoise de Linguistique Appliquée (ASLA), 15. Universitetstryckeriet: Uppsala, pp.11-30. Kemppanen, Hannu 2004. Keywords and Ideology in Translated History Texts: A Corpus-based Analysis. Across Languages and Cultures 5 (1), 89-106 Rigotti, Eddo and Andrea Rocci, 2002. From Argument Analysis to Cultural Keywords (and back again). http://www.ils.com.unisi.ch/articolirigotti-rocci-keywords-published.pdf (accessed May 2007). In F. H. van Eemeren et al, Proceedings of the 5th Conference of the International Society for the Study of Argumentation. Amsterdam: SicSat. pp. 903-908. Scott, M., 1996 with new versions in 1997, 1999, 2004, Wordsmith Tools, Oxford: Oxford University Press. Scott, M., 1997a. "PC Analysis of Key Words -- and Key Key Words", System, Vol. 25, No. 1, pp. 1-13. Scott, M., 1997b. "The Right Word in the Right Place: Key Word Associates in Two Languages", AAA - Arbeiten aus Anglistik und Amerikanistik, Vol. 22, No. 2, pp. 239-252. Scott, M., 2000a. ‘Focusing on the Text and Its Key Words’, in L. Burnard & T. McEnery (eds.), Rethinking Language Pedagogy from a Corpus Perspective, Volume 2. Frankfurt: Peter Lang., pp. 103-122. Scott, M. 2000b. Reverberations of an Echo, in B. Lewandowska-Tomaszczyk & P.J. Melia (eds.) PALC’99: Practical Applications in Language Corpora. Lodz Studies in Language, Volume 1. Frankfurt: Peter Lang., pp. 49-68. Scott, M., 2001. ‘Mapping Key Words to Problem and Solution’ in M. Scott & G. Thompson (eds.) Patterns of Text: in honour of Michael Hoey, Amsterdam: Benjamins, pp. 109-127. Scott, M., 2002. ‘Picturing the key words of a very large corpus and their lexical upshots – or getting at the Guardian’s view of the world’ in B. Kettemann & G. Marko (eds.) Teaching and Learning by Doing Corpus Analysis, Amsterdam: Rodopi, pp. 43-50 and cd-rom within the cover of the book. Scott, M. 2006. "The Importance of Key Words for LSP" in Arnó Macià, E., A. Soler Cervera & C. Rueda Ramos (eds.), Information Technology in Languages for Specific Purposes: issues and prospects. New York: Springer, pp. 231-243. Scott. M. (forthcoming) In Search of a Bad Reference Corpus. AHRC Methods Network. Scott, M. & Tribble, C., 2006. Textual Patterns: keyword and corpus analysis in language education, Amsterdam: Benjamins. Seale C, Charteris-Black J, Ziebland S. 2006. Gender, cancer experience and internet use: a comparative keyword analysis of interviews and online cancer support groups. Social Science and Medicine. 62, 10: 2577-2590 Tribble, Chris, 1999, "Genres, keywords, teaching: towards a pedagogic account of the language of project proposals" in L. Burnard & A. McEnery (eds.) Rethinking Language Pedagogy from a Corpus Perspective: Papers from the Third International Conference on Teaching and Language Corpora, (Lodz Studies in Language). Hamburg: Peter Lang. Keyness 49