Presentation Slides - ACORN Aston Corpus Network

advertisement
Key Cluster Patterns in
Shakespeare
2009 Aston Symposium
22 May 2009
Mike Scott
…in pursuit of the…
"cunning'st
pattern of
excelling
nature"
(Othello)
or
but sound and fury
signifying nothing?
Abstract





Key words (KWs) in Shakespeare plays have been shown to
belong to certain category-types such as theme-related KWs,
character-related KWs.
Other KWs, generally the more interesting ones, seem to be
pointers to other patterns indicative of quite specific features
of the language, or of the status of characters or of individual
sub-themes.
It may be that there is a tension between global KWs and
much more localised, "bursty" ones in this regard.
The presentation turns attention now to key word clusters, that
is n-grams which are shown to occur distinctively in each
individual play, or in the speeches of an individual character.
The diverse types of patterns are what will be explored here.
Are n-grams a mere coincidence of relatively frequent words
co-occurring frequently so that they are but sound and fury
signifying nothing?
Alas poor Yorick!
Double, double toil and trouble
And thereby hangs a tale
Friends, Romans, countrymen, lend
me your ears
 A blinking idiot
 Beggar'd all description




yet
 Crystal & Crystal (2002) only list oneword headwords
Aims
• take previous key word
(KW) analysis of
Shakespeare plays up
one level
• by examining KW
clusters
… a proviso
 no claim to
illuminate
understanding of
the plays,
 the objective being
to understand
more about
keyness and key
words
Clusters
sequences of consecutive
words repeatedly found
in corpora
 Biber's "bundles"
 n-grams
 no guarantee they are
"phrases"
 In WordSmith,
 n is between 2 and 8
Why bother?
 (increasing
awareness that
words don't act
alone…
 and anyway some
inconsistencies e.g.
 "behind" v. "in front
of"
 "France" v. "Saudi
Arabia" v. "United
Arab Emirates")
…but hang about
in gangs)
So how should we think about
words?
When you pick up
a word,
you pick up
another two
or three….
Keyness
 A word is said to be "key" if
 a)
it occurs in the text at least as many times as
the user has specified as a Minimum Frequency
 b)
its frequency in the text when compared with
its frequency in a reference corpus is such that the
statistical probability as computed by an appropriate
procedure is smaller than or equal to a p value
specified by the user.
 (WordSmith manual)
KW Clusters
re-interpreting "word" to include
"cluster"
so the questions are
1. How much overlap is there between
KWs and KW clusters?
2. What (if anything) do key clusters
show that KWs don't?
Procedures
with the 1916 OUP Shakespeare corpus at my
site
 build one overall "index" which knows the
positions and neighbours of each word in
all 37 plays
 compute 2-word clusters using the index
 build one individual index for each of the
plays
 compute 2-word clusters for each play
using its index
Procedures (cont.)
 repeat previous steps for all lengths
of cluster 2 to 5
 result = 38 indexes
 37 × 4 = 152 individual play cluster
wordlists
 4 cluster wordlists for the set of 37
plays
single-word list (all the plays)
N
Freq.
%
1
THE 26,831
3.29
37 100.00
2
AND 24,110
2.95
37 100.00
3
I 20,536
2.51
37 100.00
4
TO 19,155
2.35
37 100.00
5
OF 15,997
1.96
37 100.00
6
A 13,980
1.71
37 100.00
7
YOU 13,855
1.70
37 100.00
8
MY 12,283
1.50
37 100.00
9
THAT 10,760
1.32
37 100.00
10
IN 10,569
1.29
37 100.00
pure grammar
Word
Texts
%
2-word clusters
N
Word
Freq.
%
Texts
1
I AM
1,858
0.23
37 100.00
2
MY LORD
1,685
0.21
36
3
I HAVE
1,628
0.20
37 100.00
4
I WILL
1,582
0.19
37 100.00
5
IN THE
1,582
0.19
37 100.00
6
TO THE
1,518
0.19
37 100.00
7
OF THE
1,376
0.17
37 100.00
8
IT IS
1,079
0.13
37 100.00
9
TO BE
971
0.12
37 100.00
10
THAT I
914
0.11
37 100.00
I + AUX
incomplete prepositional phrases
%
97.30
3-word clusters
N
Word
Freq.
%
Texts
%
1
I PRAY YOU
250
0.03
34
91.89
2
I WILL NOT
214
0.03
36
97.30
3
I KNOW NOT
162
0.02
36
97.30
4
I DO NOT
160
0.02
33
89.19
5
I AM A
141
0.02
35
94.59
6
I AM NOT
139
0.02
34
91.89
7
MY GOOD LORD
132
0.02
29
78.38
8
AND I WILL
129
0.02
34
91.89
9
I WOULD NOT
126
0.02
34
91.89
10
THIS IS THE
122
0.01
36
97.30
negatives
4-word clusters
N
Word
Freq.
1
WITH ALL MY HEART
2
%
Texts
%
47
21
56.76
I KNOW NOT WHAT
39
20
54.05
3
GIVE ME YOUR HAND
34
19
51.35
4
I DO BESEECH YOU
33
17
45.95
5
GIVE ME THY HAND
31
22
59.46
6
I DO NOT KNOW
29
17
45.95
7
I WOULD NOT HAVE
26
18
48.65
8
AY MY GOOD LORD
25
13
35.14
9
WHAT IS THE MATTER
25
13
35.14
10
GIVE ME LEAVE TO
24
18
48.65
requesting etc., social interactions
5-word clusters
N
Word
Freq.
1
I AM GLAD TO SEE
2
Texts
%
16
9
24.32
I THANK YOU FOR YOUR
12
11
29.73
3
FOR MINE OWN PART I
10
8
21.62
4
I HAD RATHER BE A
9
8
21.62
5
WITH ALL MY HEART AND
9
8
21.62
6
AM GLAD TO SEE YOU
8
5
13.51
7
AS I AM A GENTLEMAN
8
6
16.22
8
I PRAY YOU TELL ME
8
7
18.92
9
KNOW NOT WHAT TO SAY
8
8
21.62
10
SO I TAKE MY LEAVE
8
7
18.92
social formulae
%
Procedures (cont.)
 compare the 2-cluster wordlists of
each play with the 2-cluster wordlist
of all the plays
 repeat for 3-, 4- and 5-word clusters
 37 × 4 = 148 key cluster lists
KW settings
 p value = 0.001
 minimum frequency = 2
 negative KW clusters excluded
Key 3-clusters in Lear
just a title
N Concordance
1
2
3
4
night. Have you not spoken 'gainst the Duke of Cornwall? He's coming hither,
father, and given him notice that the Duke of Cornwall and Regan his duchess
and foolish. Holds it true, sir, that the Duke of Cornwall was so slain? Most
Gloucester, I'd speak with the Duke of Cornwall and his wife. Well, my
repetition!
When we are born, we cry
that we are come
To this great stage of fools.
This' a good block!
It were a delicate stratagem
to shoe
A troop of horse with felt; I'll
put it in proof,
And when I have stol'n upon
these sons-in-law,
Then, kill, kill, kill, kill, kill,
kill!
(Lear)
more repetition!
And my poor fool is hang'd! No, no, no life!
Why should a dog, a horse, a rat, have life,
And thou no breath at all? Thou'lt come no more,
Never, never, never, never, never!
Pray you, undo this button: thank you, sir.
Do you see this? Look on her, look, her lips,
Look there, look there!
</LEAR>
<STAGE DIR>
<Dies.>
</STAGE DIR>
Character-specific
 the foul fiend (Edgar)
 Tom's a cold (Edgar)
 i' the middle (Fool)
theme of the play
 dost thou know?
 thou know me?
speech-specific, rhythmic
Have more than thou showest,
Speak less than thou knowest,
Lend less than thou owest,
Ride more than thou goest,
Learn more than thou trowest,
Set less than thou throwest;
Leave thy drink and thy whore,
And keep in-a-door,
And thou shalt have more
Than two tens to a score
RQ 1 (How much overlap is there between KWs
and KW clusters?)
Procedure
For selected plays (Hamlet, Romeo, Henry IV part 1, As You Like It):
1.
Save the column of single word KWs as a plain text file
2.
Save the column of 2-cluster KWs as a separate file too
3.
Save the columns of 3-, 4- and 5-cluster KWs likewise
4.
Make wordlists of these "texts"
5.
Compute "detailed consistency" of these wordlists
6.
Use "Set" function to classify items which appear in various
listings
7.
Identify the percentage of words which appear in the KW-cluster
lists but not in the single word KW listings & vice-versa
8.
Identify items which appear in numerous listings.
Romeo and Juliet
 There are 43% (207-117 = 90) of the
KWs which come into the 2-,3-,4-,or 5word KW clusters but are absent from the
single KW list.
 2s not found in the single KW list include
high frequency grammar items (THE, MY,
AT, TO etc.)
 2s which are not found elsewhere in any
cluster include SHALL
 3s not found elsewhere include TELL,
WHERE
 4s not found elsewhere include
COMMEND
types in KW list but not in KW
clusters (A-C)
 AH, ALACK, AN, APOTHECARY, BED,
BENVOLIO, CAPULET, CLOUDS,
CORDS, CORSE
Common to 4 or 5 KW listings
 HER, O, SILVER, A, ART, BOTH, JULE,
LADY, PLAGUE, SOUND, THOU, THY,
WITH YOUR
As You Like It
 There are 48% (190-98 = 92) KWs which
come into the 2-,3-,4-,or 5-word KW
clusters but are absent from the single KW
list.
 2s not found in the single KW list include
high freq. grammar items (THE, OF, FOR,
AND)
 2s which are not found elsewhere include
HIM, WHO
 3s not found elsewhere include AT, WOULD
types in KW list but not in KW
clusters (A-C)
 ADAM, ALIENA, AMBLES, AURDEY,
BEARDS, CELIA, CHARLES, CLOWN,
COUNTERFEITED, COUTIER'S,
COVERED, COZ, CURED
Henry IV part 1
 There are 43% (204-117 =87) KWs which come into
the 2-,3-,4-,or 5-word KW clusters but are absent
from the single KW list.
 2s not found in the single KW list include high
frequency grammar items (IN, TO, YOU) but also SIR,
TRUE
 2s which are not found elsewhere include TWO, FEAR,
FIRE, CUDGEL
 3s not found elsewhere include WELL, WHY, FATHER
 4s not found elsewhere include GIVE, ARE, DOOR, LET
types in KW list but not in KW
clusters (A-C)
 AFOOT, BANISH, BARDOLPH,
CLIFTON, COMPULSION,
COUNTERFEIT, COWARD
Hamlet
 There are (44%) 140-79 =61 KWs which
come into the 2-,3-,4-,or 5-word KW
clusters but are absent from the single
KW list.
 2s not found in the single KW list include
high freq. grammar items (MY, AND OF)
but also GOOD
 2s which are not found elsewhere include
FROM, O, OUR, IS, IN
 3s not found elsewhere include HOW,
LIFE, EXCEPT, YOUR, REVENGE, NOT,
OWN
types in KW list but not in KW
clusters (A-C)
 ACT, ARGAL, BERNARDO, CLOSES,
CUSTOM
Common to 3 or 4 KW listings
 NUNNERY, A, HAMLET, HAVE, I, IT,
LORD, OPHELIA, THE, TO, WAGER
RQ 1: How much overlap is there between KWs
and KW clusters?
 More than 50% of the single-word
KWs are in the clusters
 but the clusters add some 40% or
more extra words
 not all additions are grammatical
 Key clusters tail off at 4 or 5
at 4 Kws, which play is this?
midsummer night's dream
all's well that ends well
anthony & cleopatra
"bursty" keyness?
bursts (1)
midsummer night's dream
bursts (2)
julius caesar
bursts (3)
macbeth
bursts of burstiness
as you like it
compare burstinesses?
king lear 2s (part)
3s and 4s
king lear
Conclusions
1. How much overlap is there between
KWs and KW clusters?
Only a moderate amount; they highlight
different aspects of the play
2. What (if anything) do key clusters
show that KWs don't?
At the extremes they may highlight songs and
very localised bursts in the play but by no
means always or only this
<SHALLOW>

It is well said, in faith, sir; and it is well said indeed too.
'Better accommodated!' it is good; yea indeed, is it: good
phrases are surely and ever were, very commendable.
Accommodated! it comes of accommodo: very good; a good
phrase.
</SHALLOW>
<BARDOLPH>

Pardon me, sir; I have heard the word. 'Phrase,' call you
it? By this good day, I know not the phrase; but I will maintain
the word with my sword to be a soldier-like word, and a word
of exceeding good command, by heaven. Accommodated; that
is, when a man is, as they say, accommodated; or, when a
man is, being, whereby, a' may be thought to be
accommodated, which is an excellent thing.
</BARDOLPH>
References
• Crystal, David & Ben Crystal, 2002. Shakespeare's words. London:
Penguin.
Join us in Liverpool
Download