Move to W:\UHLE\015

advertisement
USING MINITAB: AUTHORSHIP ATTRIBUTION
1
Analysing the Q.C.S. Letters in Minitab
1.1
Retrieving initial data
(a)
Start / Programs / Current Applications / Minitab / Minitab
Now retrieve the word-length data related to the three sets of Q.C.S. letters:
(b)
File / Open Worksheet
(the relevant dialog box will appear)
Move to W:\UHLE\015
1.2
(c)
Click on start.mtw then Open
(d)
Save this file to your own Y: drive under the filename start.mtw using
File / Save Current Worksheet As…
Comparing the Q.C.S. Letters
Compare the three sets of letters for consistency in word-length distribution.
First compare them graphically. To make the numbers comparable, change the
frequencies into proportions by dividing each number in a column by the column
total. You will need to deal with columns C5 C6 C7 in turn. Begin with
column C5:
(a)
Calc / Calculator… (the Calculator dialog box will appear)
(b)
In the Store result in variable: box, type: C5
(c)
In the Expression: box, type: C2/SUM(C2)
(d)
Click on OK, then note how the results are stored in column C5
You have now dealt with the first set of letters data. You need to carry out the
same procedure for the other two sets of letters data, storing the results in C6
C7:
(e)
Repeat the above process (a)-(d):
1.
Stage (b)
Stage (c)
Store … C6
Expression …C3/SUM(C3)
2.
Stage (b)
Stage (c)
Store … C7
Expression …C4/SUM(C4)
(f)
Finally save via: File / Save Current Worksheet
You now have (proportional) distributions for all three sets of Q.C.S. letters in
C5 C6 C7. Plot these against word-length (C1):
(f)
Graph / Draftsman Plot…
(g)
Click in the Y variables box, then in turn click on C5 C6 C7, and click
on the Select button each time. (Then check all three column labels are
included in the box.)
(h)
Click in the X variables box, then click on C1, and click on the Select
button. (Then check the C1 column label is included in the box.)
(i)
Click on OK to view the graphs. Do the distributions look similar?
Second, having compared the three distributions graphically, do a Chi-square
test. The test will be of the null hypothesis that all three sets of letters have the
same word-length distribution (remember to use the actual not the proportional
values when you calculate Chi-square):
(j)
Stat / Tables / Chi-square Test
(the dialog box will appear)
(k)
In the Columns containing the table: box, enter the column labels:
C2-C4 then click on OK
Is there any evidence that the three samples differ?
This completes the initial analysis of the Q.C.S. letters.
2.
Analysing the Twain Letters
2.1
Importing the Twain data into the worksheet
(a)
File / Other Files / Import Special Text… (the relevant dialog box
will appear)
(b)
In the Store data in column(s): box, type in the column labels: C8-C12
then OK
The Import Text From File dialog box will then appear.
Move to W:\UHLE\015
(c)
Click on twain.dat then Open (check new data imported into C8-C12)
(d)
Save this whole file (containing data in C1-C12) to your own Y: drive
under the filename all.mtw using File / Save Current Worksheet As…
2
2.2
Analysing the first three Twain samples
Start by focussing on the first three from the five Twain samples you have just
imported into C8-C12. (These three groups of Twain letters were written at about the
same time as the Q.C.S. letters.) Do both a plot and a Chi-square test to compare the
samples.
First compare them graphically. To make the numbers comparable, change the
frequencies into proportions by dividing each number in a column by the column total.
You will need to deal with columns C13 C14 C15 in turn. Begin with column C13:
(a)
Calc / Calculator… (the Calculator dialog box will appear)
(b)
In the Store result in variable: box, type: C13
(c)
In the Expression: box, type: C8/SUM(C8)
(d)
Click on OK, then note how the results are stored in column C13
You have now dealt with the first set of letters data. You need to carry out the
same procedure for the other two sets of letters data, storing the results in C14 C15:
(e)
Repeat the above process (a)-(d):
1.
Stage (b)
Stage (c)
Store … C14
Expression …C9/SUM(C9)
2.
Stage (b)
Stage (c)
Store … C15
Expression …C10/SUM(C10)
(f)
Finally save via: File / Save Current Worksheet
You now have (proportional) distributions for the first three sets of Twain letters in
C13 C14 C15. Plot these against word-length (C1):
(g)
Graph / Draftsman Plot…
(h)
Click in the Y variables box, then in turn click on C13 C14 C15, and click on
the Select button each time. (Then check all three column labels are included
in the box.)
(i)
Click in the X variables box, then click on C1, and click on the Select button.
(Then check the C1 column label is included in the box.)
(j)
Click on OK to view the graphs. Do the distributions look similar?
Second, having compared the first three Twain distributions graphically, do a Chisquare test. The test will be of the null hypothesis that these three sets of Twain
3
letters have the same word-length distribution (remember to use the actual not the
proportional values when you calculate Chi-square):
Stat / Tables / Chi-square Test
(the dialog box will appear)
In the Columns containing the table: box, enter the column labels: C8-C10 then
click on OK
Is there any evidence that the three samples differ?
This completes the analysis of the first three sets of Twain letters.
2.3
Analysing the full range of Twain samples
After checking the consistency of the first three samples, compare Twain’s writing
over a large span of years: early, middle and late periods. Begin by combining the first
three samples you have already worked on into one column of early work (C16):
(a)
Calc / Calculator… (the Calculator dialog box will appear)
(b)
In the Store result in variable: box, type: C16
(c)
In the Expression: box, type: C8+C9+C10
(d)
Click on OK, then note how the results are stored in column C16
So now you have columns for Twain’s early work (C16), as well as for the middle
period (C11), and the late period (C12). As usual do both a plot (using proportional
distributions) and a Chi-square test.
First compare them graphically. To make the numbers comparable, change the
frequencies into proportions by dividing each number in a column by the column total.
You will need to deal with columns C17 C18 C19 in turn. Begin with column C17:
(e)
Calc / Calculator… (the Calculator dialog box will appear)
(f)
In the Store result in variable: box, type: C17
(g)
In the Expression: box, type: C16/SUM(C16)
(h)
Click on OK, then note how the results are stored in column C17
You have now dealt with the first set of letters data. You need to carry out the
same procedure for the other two sets of letters data, storing the results in C18 C19:
(i)
Repeat the above process (e)-(h):
4
1.
Stage (f)
Stage (g)
Store … C18
Expression …C11/SUM(C11)
2.
Stage (f)
Stage (g)
Store … C19
Expression …C12/SUM(C12)
(j)
Finally save via: File / Save Current Worksheet
You now have (proportional) distributions for the three sets of Twain letters in C17
C18 C19. Plot these against word-length (C1):
(k)
Graph / Draftsman Plot…
(l)
Click in the Y variables box, then in turn click on C17 C18 C19, and click on
the Select button each time. (Then check all three column labels are included
in the box.)
(m)
Click in the X variables box, then click on C1, and click on the Select button.
(Then check the C1 column label is included in the box.)
Click on OK to view the graphs. Do the distributions look similar?
Second, having compared the early, middle and late Twain samples graphically, do a
Chi-square test. The test will be of the null hypothesis that these three sets of Twain
letters have the same word-length distribution (remember to use the actual not the
proportional values when you calculate chi-square):
(n)
Stat / Tables / Chi-square Test
(the dialog box will appear)
(o)
In the Columns containing the table: box, enter the column labels: C16 C11
C12 then click on OK
Is there any evidence that the three samples differ?
This completes the analysis of the early, middle and late Twain samples.
3.
Comparing the Q.C.S. and Twain Letters
3.1
Creating one column for each set of material
Create two columns, one for the Q.C.S. material, and one for the Twain material.
Begin with the former:
(a)
Calc / Calculator… (the Calculator dialog box will appear)
(b)
In the Store result in variable: box, type: C20
(c)
In the Expression: box, type: C2+C3+C4
5
(d)
Click on OK, then note how the results are stored in column C20
So now you have one column (C20) for all the Q.C.S. material. Now create one
column for all the Twain material:
(e)
Calc / Calculator… (the Calculator dialog box will appear)
(f)
In the Store result in variable: box, type: C21
(g)
In the Expression: box, type: C8+C9+C10+C11+C12
(h)
Click on OK, then note how the results are stored in column C21
(i)
Finally save via: File / Save Current Worksheet
Now you have two columns, one for each set of materials. Examine them in the usual
way – plot and Chi-square test.
3.2
Comparing the Q.C.S. and Twain materials graphically
First compare them graphically. To make the numbers comparable, change the
frequencies into proportions by dividing each number in a column by the column total.
You will need to deal with columns C22 C23 in turn. Begin with column C22:
(j)
Calc / Calculator… (the Calculator dialog box will appear)
(k)
In the Store result in variable: box, type: C22
(l)
In the Expression: box, type: C20/SUM(C20)
(m)
Click on OK, then note how the results are stored in column C22
You have now dealt with the Q.C.S. column. You need to carry out the same
procedure for the Twain column, storing the results in C23:
(n)
Repeat the above process (e)-(h):
Stage (k)
Stage (l)
(o)
Store … C23
Expression …C21/SUM(C21)
Finally save via: File / Save Current Worksheet
You now have (proportional) distributions for the Q.C.S. and Twain columns in C22
C23. Plot these against word-length (C1):
(p)
Graph / Draftsman Plot…
6
(q)
Click in the Y variables box, then in turn click on C22 C23, and click on the
Select button each time. (Then check all three column labels are included in
the box.)
(r)
Click in the X variables box, then click on C1, and click on the Select button.
(Then check the C1 column label is included in the box.)
(s)
Click on OK to view the graphs. Do the distributions look similar?
Second, having compared the Q.C.S. and Twain samples graphically, do a Chisquare test. The test will be of the null hypothesis that the two sets of data have the
same word-length distribution (remember to use the actual not the proportional values
when you calculate chi-square):
(t)
Stat / Tables / Chi-square Test
(the dialog box will appear)
(u)
In the Columns containing the table: box, enter the column labels: C20 C21
then click on OK
Did Mark Twain write the Q.C.S. letters?
Noel Heather
7
Download