Key Cluster Patterns in Shakespeare 2009 Aston Symposium 22 May 2009 Mike Scott …in pursuit of the… "cunning'st pattern of excelling nature" (Othello) or but sound and fury signifying nothing? Abstract Key words (KWs) in Shakespeare plays have been shown to belong to certain category-types such as theme-related KWs, character-related KWs. Other KWs, generally the more interesting ones, seem to be pointers to other patterns indicative of quite specific features of the language, or of the status of characters or of individual sub-themes. It may be that there is a tension between global KWs and much more localised, "bursty" ones in this regard. The presentation turns attention now to key word clusters, that is n-grams which are shown to occur distinctively in each individual play, or in the speeches of an individual character. The diverse types of patterns are what will be explored here. Are n-grams a mere coincidence of relatively frequent words co-occurring frequently so that they are but sound and fury signifying nothing? Alas poor Yorick! Double, double toil and trouble And thereby hangs a tale Friends, Romans, countrymen, lend me your ears A blinking idiot Beggar'd all description yet Crystal & Crystal (2002) only list oneword headwords Aims • take previous key word (KW) analysis of Shakespeare plays up one level • by examining KW clusters … a proviso no claim to illuminate understanding of the plays, the objective being to understand more about keyness and key words Clusters sequences of consecutive words repeatedly found in corpora Biber's "bundles" n-grams no guarantee they are "phrases" In WordSmith, n is between 2 and 8 Why bother? (increasing awareness that words don't act alone… and anyway some inconsistencies e.g. "behind" v. "in front of" "France" v. "Saudi Arabia" v. "United Arab Emirates") …but hang about in gangs) So how should we think about words? When you pick up a word, you pick up another two or three…. Keyness A word is said to be "key" if a) it occurs in the text at least as many times as the user has specified as a Minimum Frequency b) its frequency in the text when compared with its frequency in a reference corpus is such that the statistical probability as computed by an appropriate procedure is smaller than or equal to a p value specified by the user. (WordSmith manual) KW Clusters re-interpreting "word" to include "cluster" so the questions are 1. How much overlap is there between KWs and KW clusters? 2. What (if anything) do key clusters show that KWs don't? Procedures with the 1916 OUP Shakespeare corpus at my site build one overall "index" which knows the positions and neighbours of each word in all 37 plays compute 2-word clusters using the index build one individual index for each of the plays compute 2-word clusters for each play using its index Procedures (cont.) repeat previous steps for all lengths of cluster 2 to 5 result = 38 indexes 37 × 4 = 152 individual play cluster wordlists 4 cluster wordlists for the set of 37 plays single-word list (all the plays) N Freq. % 1 THE 26,831 3.29 37 100.00 2 AND 24,110 2.95 37 100.00 3 I 20,536 2.51 37 100.00 4 TO 19,155 2.35 37 100.00 5 OF 15,997 1.96 37 100.00 6 A 13,980 1.71 37 100.00 7 YOU 13,855 1.70 37 100.00 8 MY 12,283 1.50 37 100.00 9 THAT 10,760 1.32 37 100.00 10 IN 10,569 1.29 37 100.00 pure grammar Word Texts % 2-word clusters N Word Freq. % Texts 1 I AM 1,858 0.23 37 100.00 2 MY LORD 1,685 0.21 36 3 I HAVE 1,628 0.20 37 100.00 4 I WILL 1,582 0.19 37 100.00 5 IN THE 1,582 0.19 37 100.00 6 TO THE 1,518 0.19 37 100.00 7 OF THE 1,376 0.17 37 100.00 8 IT IS 1,079 0.13 37 100.00 9 TO BE 971 0.12 37 100.00 10 THAT I 914 0.11 37 100.00 I + AUX incomplete prepositional phrases % 97.30 3-word clusters N Word Freq. % Texts % 1 I PRAY YOU 250 0.03 34 91.89 2 I WILL NOT 214 0.03 36 97.30 3 I KNOW NOT 162 0.02 36 97.30 4 I DO NOT 160 0.02 33 89.19 5 I AM A 141 0.02 35 94.59 6 I AM NOT 139 0.02 34 91.89 7 MY GOOD LORD 132 0.02 29 78.38 8 AND I WILL 129 0.02 34 91.89 9 I WOULD NOT 126 0.02 34 91.89 10 THIS IS THE 122 0.01 36 97.30 negatives 4-word clusters N Word Freq. 1 WITH ALL MY HEART 2 % Texts % 47 21 56.76 I KNOW NOT WHAT 39 20 54.05 3 GIVE ME YOUR HAND 34 19 51.35 4 I DO BESEECH YOU 33 17 45.95 5 GIVE ME THY HAND 31 22 59.46 6 I DO NOT KNOW 29 17 45.95 7 I WOULD NOT HAVE 26 18 48.65 8 AY MY GOOD LORD 25 13 35.14 9 WHAT IS THE MATTER 25 13 35.14 10 GIVE ME LEAVE TO 24 18 48.65 requesting etc., social interactions 5-word clusters N Word Freq. 1 I AM GLAD TO SEE 2 Texts % 16 9 24.32 I THANK YOU FOR YOUR 12 11 29.73 3 FOR MINE OWN PART I 10 8 21.62 4 I HAD RATHER BE A 9 8 21.62 5 WITH ALL MY HEART AND 9 8 21.62 6 AM GLAD TO SEE YOU 8 5 13.51 7 AS I AM A GENTLEMAN 8 6 16.22 8 I PRAY YOU TELL ME 8 7 18.92 9 KNOW NOT WHAT TO SAY 8 8 21.62 10 SO I TAKE MY LEAVE 8 7 18.92 social formulae % Procedures (cont.) compare the 2-cluster wordlists of each play with the 2-cluster wordlist of all the plays repeat for 3-, 4- and 5-word clusters 37 × 4 = 148 key cluster lists KW settings p value = 0.001 minimum frequency = 2 negative KW clusters excluded Key 3-clusters in Lear just a title N Concordance 1 2 3 4 night. Have you not spoken 'gainst the Duke of Cornwall? He's coming hither, father, and given him notice that the Duke of Cornwall and Regan his duchess and foolish. Holds it true, sir, that the Duke of Cornwall was so slain? Most Gloucester, I'd speak with the Duke of Cornwall and his wife. Well, my repetition! When we are born, we cry that we are come To this great stage of fools. This' a good block! It were a delicate stratagem to shoe A troop of horse with felt; I'll put it in proof, And when I have stol'n upon these sons-in-law, Then, kill, kill, kill, kill, kill, kill! (Lear) more repetition! And my poor fool is hang'd! No, no, no life! Why should a dog, a horse, a rat, have life, And thou no breath at all? Thou'lt come no more, Never, never, never, never, never! Pray you, undo this button: thank you, sir. Do you see this? Look on her, look, her lips, Look there, look there! </LEAR> <STAGE DIR> <Dies.> </STAGE DIR> Character-specific the foul fiend (Edgar) Tom's a cold (Edgar) i' the middle (Fool) theme of the play dost thou know? thou know me? speech-specific, rhythmic Have more than thou showest, Speak less than thou knowest, Lend less than thou owest, Ride more than thou goest, Learn more than thou trowest, Set less than thou throwest; Leave thy drink and thy whore, And keep in-a-door, And thou shalt have more Than two tens to a score RQ 1 (How much overlap is there between KWs and KW clusters?) Procedure For selected plays (Hamlet, Romeo, Henry IV part 1, As You Like It): 1. Save the column of single word KWs as a plain text file 2. Save the column of 2-cluster KWs as a separate file too 3. Save the columns of 3-, 4- and 5-cluster KWs likewise 4. Make wordlists of these "texts" 5. Compute "detailed consistency" of these wordlists 6. Use "Set" function to classify items which appear in various listings 7. Identify the percentage of words which appear in the KW-cluster lists but not in the single word KW listings & vice-versa 8. Identify items which appear in numerous listings. Romeo and Juliet There are 43% (207-117 = 90) of the KWs which come into the 2-,3-,4-,or 5word KW clusters but are absent from the single KW list. 2s not found in the single KW list include high frequency grammar items (THE, MY, AT, TO etc.) 2s which are not found elsewhere in any cluster include SHALL 3s not found elsewhere include TELL, WHERE 4s not found elsewhere include COMMEND types in KW list but not in KW clusters (A-C) AH, ALACK, AN, APOTHECARY, BED, BENVOLIO, CAPULET, CLOUDS, CORDS, CORSE Common to 4 or 5 KW listings HER, O, SILVER, A, ART, BOTH, JULE, LADY, PLAGUE, SOUND, THOU, THY, WITH YOUR As You Like It There are 48% (190-98 = 92) KWs which come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list. 2s not found in the single KW list include high freq. grammar items (THE, OF, FOR, AND) 2s which are not found elsewhere include HIM, WHO 3s not found elsewhere include AT, WOULD types in KW list but not in KW clusters (A-C) ADAM, ALIENA, AMBLES, AURDEY, BEARDS, CELIA, CHARLES, CLOWN, COUNTERFEITED, COUTIER'S, COVERED, COZ, CURED Henry IV part 1 There are 43% (204-117 =87) KWs which come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list. 2s not found in the single KW list include high frequency grammar items (IN, TO, YOU) but also SIR, TRUE 2s which are not found elsewhere include TWO, FEAR, FIRE, CUDGEL 3s not found elsewhere include WELL, WHY, FATHER 4s not found elsewhere include GIVE, ARE, DOOR, LET types in KW list but not in KW clusters (A-C) AFOOT, BANISH, BARDOLPH, CLIFTON, COMPULSION, COUNTERFEIT, COWARD Hamlet There are (44%) 140-79 =61 KWs which come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list. 2s not found in the single KW list include high freq. grammar items (MY, AND OF) but also GOOD 2s which are not found elsewhere include FROM, O, OUR, IS, IN 3s not found elsewhere include HOW, LIFE, EXCEPT, YOUR, REVENGE, NOT, OWN types in KW list but not in KW clusters (A-C) ACT, ARGAL, BERNARDO, CLOSES, CUSTOM Common to 3 or 4 KW listings NUNNERY, A, HAMLET, HAVE, I, IT, LORD, OPHELIA, THE, TO, WAGER RQ 1: How much overlap is there between KWs and KW clusters? More than 50% of the single-word KWs are in the clusters but the clusters add some 40% or more extra words not all additions are grammatical Key clusters tail off at 4 or 5 at 4 Kws, which play is this? midsummer night's dream all's well that ends well anthony & cleopatra "bursty" keyness? bursts (1) midsummer night's dream bursts (2) julius caesar bursts (3) macbeth bursts of burstiness as you like it compare burstinesses? king lear 2s (part) 3s and 4s king lear Conclusions 1. How much overlap is there between KWs and KW clusters? Only a moderate amount; they highlight different aspects of the play 2. What (if anything) do key clusters show that KWs don't? At the extremes they may highlight songs and very localised bursts in the play but by no means always or only this <SHALLOW> It is well said, in faith, sir; and it is well said indeed too. 'Better accommodated!' it is good; yea indeed, is it: good phrases are surely and ever were, very commendable. Accommodated! it comes of accommodo: very good; a good phrase. </SHALLOW> <BARDOLPH> Pardon me, sir; I have heard the word. 'Phrase,' call you it? By this good day, I know not the phrase; but I will maintain the word with my sword to be a soldier-like word, and a word of exceeding good command, by heaven. Accommodated; that is, when a man is, as they say, accommodated; or, when a man is, being, whereby, a' may be thought to be accommodated, which is an excellent thing. </BARDOLPH> References • Crystal, David & Ben Crystal, 2002. Shakespeare's words. London: Penguin. Join us in Liverpool