USING MINITAB: AUTHORSHIP ATTRIBUTION 1 Analysing the Q.C.S. Letters in Minitab 1.1 Retrieving initial data (a) Start / Programs / Current Applications / Minitab / Minitab Now retrieve the word-length data related to the three sets of Q.C.S. letters: (b) File / Open Worksheet (the relevant dialog box will appear) Move to W:\UHLE\015 1.2 (c) Click on start.mtw then Open (d) Save this file to your own Y: drive under the filename start.mtw using File / Save Current Worksheet As… Comparing the Q.C.S. Letters Compare the three sets of letters for consistency in word-length distribution. First compare them graphically. To make the numbers comparable, change the frequencies into proportions by dividing each number in a column by the column total. You will need to deal with columns C5 C6 C7 in turn. Begin with column C5: (a) Calc / Calculator… (the Calculator dialog box will appear) (b) In the Store result in variable: box, type: C5 (c) In the Expression: box, type: C2/SUM(C2) (d) Click on OK, then note how the results are stored in column C5 You have now dealt with the first set of letters data. You need to carry out the same procedure for the other two sets of letters data, storing the results in C6 C7: (e) Repeat the above process (a)-(d): 1. Stage (b) Stage (c) Store … C6 Expression …C3/SUM(C3) 2. Stage (b) Stage (c) Store … C7 Expression …C4/SUM(C4) (f) Finally save via: File / Save Current Worksheet You now have (proportional) distributions for all three sets of Q.C.S. letters in C5 C6 C7. Plot these against word-length (C1): (f) Graph / Draftsman Plot… (g) Click in the Y variables box, then in turn click on C5 C6 C7, and click on the Select button each time. (Then check all three column labels are included in the box.) (h) Click in the X variables box, then click on C1, and click on the Select button. (Then check the C1 column label is included in the box.) (i) Click on OK to view the graphs. Do the distributions look similar? Second, having compared the three distributions graphically, do a Chi-square test. The test will be of the null hypothesis that all three sets of letters have the same word-length distribution (remember to use the actual not the proportional values when you calculate Chi-square): (j) Stat / Tables / Chi-square Test (the dialog box will appear) (k) In the Columns containing the table: box, enter the column labels: C2-C4 then click on OK Is there any evidence that the three samples differ? This completes the initial analysis of the Q.C.S. letters. 2. Analysing the Twain Letters 2.1 Importing the Twain data into the worksheet (a) File / Other Files / Import Special Text… (the relevant dialog box will appear) (b) In the Store data in column(s): box, type in the column labels: C8-C12 then OK The Import Text From File dialog box will then appear. Move to W:\UHLE\015 (c) Click on twain.dat then Open (check new data imported into C8-C12) (d) Save this whole file (containing data in C1-C12) to your own Y: drive under the filename all.mtw using File / Save Current Worksheet As… 2 2.2 Analysing the first three Twain samples Start by focussing on the first three from the five Twain samples you have just imported into C8-C12. (These three groups of Twain letters were written at about the same time as the Q.C.S. letters.) Do both a plot and a Chi-square test to compare the samples. First compare them graphically. To make the numbers comparable, change the frequencies into proportions by dividing each number in a column by the column total. You will need to deal with columns C13 C14 C15 in turn. Begin with column C13: (a) Calc / Calculator… (the Calculator dialog box will appear) (b) In the Store result in variable: box, type: C13 (c) In the Expression: box, type: C8/SUM(C8) (d) Click on OK, then note how the results are stored in column C13 You have now dealt with the first set of letters data. You need to carry out the same procedure for the other two sets of letters data, storing the results in C14 C15: (e) Repeat the above process (a)-(d): 1. Stage (b) Stage (c) Store … C14 Expression …C9/SUM(C9) 2. Stage (b) Stage (c) Store … C15 Expression …C10/SUM(C10) (f) Finally save via: File / Save Current Worksheet You now have (proportional) distributions for the first three sets of Twain letters in C13 C14 C15. Plot these against word-length (C1): (g) Graph / Draftsman Plot… (h) Click in the Y variables box, then in turn click on C13 C14 C15, and click on the Select button each time. (Then check all three column labels are included in the box.) (i) Click in the X variables box, then click on C1, and click on the Select button. (Then check the C1 column label is included in the box.) (j) Click on OK to view the graphs. Do the distributions look similar? Second, having compared the first three Twain distributions graphically, do a Chisquare test. The test will be of the null hypothesis that these three sets of Twain 3 letters have the same word-length distribution (remember to use the actual not the proportional values when you calculate Chi-square): Stat / Tables / Chi-square Test (the dialog box will appear) In the Columns containing the table: box, enter the column labels: C8-C10 then click on OK Is there any evidence that the three samples differ? This completes the analysis of the first three sets of Twain letters. 2.3 Analysing the full range of Twain samples After checking the consistency of the first three samples, compare Twain’s writing over a large span of years: early, middle and late periods. Begin by combining the first three samples you have already worked on into one column of early work (C16): (a) Calc / Calculator… (the Calculator dialog box will appear) (b) In the Store result in variable: box, type: C16 (c) In the Expression: box, type: C8+C9+C10 (d) Click on OK, then note how the results are stored in column C16 So now you have columns for Twain’s early work (C16), as well as for the middle period (C11), and the late period (C12). As usual do both a plot (using proportional distributions) and a Chi-square test. First compare them graphically. To make the numbers comparable, change the frequencies into proportions by dividing each number in a column by the column total. You will need to deal with columns C17 C18 C19 in turn. Begin with column C17: (e) Calc / Calculator… (the Calculator dialog box will appear) (f) In the Store result in variable: box, type: C17 (g) In the Expression: box, type: C16/SUM(C16) (h) Click on OK, then note how the results are stored in column C17 You have now dealt with the first set of letters data. You need to carry out the same procedure for the other two sets of letters data, storing the results in C18 C19: (i) Repeat the above process (e)-(h): 4 1. Stage (f) Stage (g) Store … C18 Expression …C11/SUM(C11) 2. Stage (f) Stage (g) Store … C19 Expression …C12/SUM(C12) (j) Finally save via: File / Save Current Worksheet You now have (proportional) distributions for the three sets of Twain letters in C17 C18 C19. Plot these against word-length (C1): (k) Graph / Draftsman Plot… (l) Click in the Y variables box, then in turn click on C17 C18 C19, and click on the Select button each time. (Then check all three column labels are included in the box.) (m) Click in the X variables box, then click on C1, and click on the Select button. (Then check the C1 column label is included in the box.) Click on OK to view the graphs. Do the distributions look similar? Second, having compared the early, middle and late Twain samples graphically, do a Chi-square test. The test will be of the null hypothesis that these three sets of Twain letters have the same word-length distribution (remember to use the actual not the proportional values when you calculate chi-square): (n) Stat / Tables / Chi-square Test (the dialog box will appear) (o) In the Columns containing the table: box, enter the column labels: C16 C11 C12 then click on OK Is there any evidence that the three samples differ? This completes the analysis of the early, middle and late Twain samples. 3. Comparing the Q.C.S. and Twain Letters 3.1 Creating one column for each set of material Create two columns, one for the Q.C.S. material, and one for the Twain material. Begin with the former: (a) Calc / Calculator… (the Calculator dialog box will appear) (b) In the Store result in variable: box, type: C20 (c) In the Expression: box, type: C2+C3+C4 5 (d) Click on OK, then note how the results are stored in column C20 So now you have one column (C20) for all the Q.C.S. material. Now create one column for all the Twain material: (e) Calc / Calculator… (the Calculator dialog box will appear) (f) In the Store result in variable: box, type: C21 (g) In the Expression: box, type: C8+C9+C10+C11+C12 (h) Click on OK, then note how the results are stored in column C21 (i) Finally save via: File / Save Current Worksheet Now you have two columns, one for each set of materials. Examine them in the usual way – plot and Chi-square test. 3.2 Comparing the Q.C.S. and Twain materials graphically First compare them graphically. To make the numbers comparable, change the frequencies into proportions by dividing each number in a column by the column total. You will need to deal with columns C22 C23 in turn. Begin with column C22: (j) Calc / Calculator… (the Calculator dialog box will appear) (k) In the Store result in variable: box, type: C22 (l) In the Expression: box, type: C20/SUM(C20) (m) Click on OK, then note how the results are stored in column C22 You have now dealt with the Q.C.S. column. You need to carry out the same procedure for the Twain column, storing the results in C23: (n) Repeat the above process (e)-(h): Stage (k) Stage (l) (o) Store … C23 Expression …C21/SUM(C21) Finally save via: File / Save Current Worksheet You now have (proportional) distributions for the Q.C.S. and Twain columns in C22 C23. Plot these against word-length (C1): (p) Graph / Draftsman Plot… 6 (q) Click in the Y variables box, then in turn click on C22 C23, and click on the Select button each time. (Then check all three column labels are included in the box.) (r) Click in the X variables box, then click on C1, and click on the Select button. (Then check the C1 column label is included in the box.) (s) Click on OK to view the graphs. Do the distributions look similar? Second, having compared the Q.C.S. and Twain samples graphically, do a Chisquare test. The test will be of the null hypothesis that the two sets of data have the same word-length distribution (remember to use the actual not the proportional values when you calculate chi-square): (t) Stat / Tables / Chi-square Test (the dialog box will appear) (u) In the Columns containing the table: box, enter the column labels: C20 C21 then click on OK Did Mark Twain write the Q.C.S. letters? Noel Heather 7