A Comparison of Three Different Cheating Detection Programs Daniel Boren Summary The introductory programming courses at the University of Washington have used two different scripts in the recent past to compare programs turned in by its student in an effort to detect plagiarism. Both of these scripts operate by doing line-wise comparisons of each file turned in, and then print out a number telling something about the likelihood that a given pair of files indicate a case of cheating. Recently, a new software plagiarism detection service named "moss" (for Measure of Software Similarity) has become available from U.C. Berkley 1. This service compares files, analysing the structure of the programs and measuring the similarity of a pair of files bases on the number of tokens that match rather than the number of lines. To compare the efficacy of these three programs, seven different versions of the same homework set were prepared. One of the other student's solution was copied and altered by various means for each of the seven rounds. Although all three of the cheating detection programs were eventually defeated, moss was significantly better at detecting cheating than the scripts that performed only line-wise comparisons of the files. The Three Programs Here is a brief description of each of the three programs and how they work: Cheatcheck This is a shell script written by Professor Larry Ruzzo. It functions by comparing each pair of files in a subdirectory structure. The comparison is accomplished running diff on each pair and then piping the results to wc. This approach has the advantage of being easily implemented and using readily available tools. Disadvantages include an N^2 running time and a fairly simple line-wise comparison technique. This script outputs a number indicating to what degree each pair was found to be similar, subject to a threshold specified by the user. Unlike the other two programs, a low score indicates a higher probability of cheating (zero indicated two identical files). Cheat This program is a perl script written by Joshua Seims. It reads each line in each file it is asked to analyze, stripping out white space and inserting it into an associative array. When duplicate entries are found, the script makes note of which two files contained the duplicate line. Subject to a configurable threshold, a number is printed out indicated how many lines were duplicated in each of the two files. While this script runs faster than cheatcheck, it still uses the somewhat rudimentary method of comparing two files and looking for identical lines. This program outputs higher scores to indicate a higher likelihood of cheating. Moss Recently made available though U.C. Berkley, the exact methods used by this program are kept proprietary by its author. The information that is available states that moss actually analyses the program structure to look for similarities. This is done by tokenizing the input and then searching for patterns of matching tokens between two files. While the other two programs could be used for homework written in nearly any language, moss presently analyzes only programs written in C, C++, Java, Pascal, ML, Lisp and Scheme. The service is available via subscription and is free of charge. Upon subscribing, the user is provided with a script that zips up the files that the user wishes to have analyzed and mails them off to the moss server. The results are returned via email. 1 http://www.cs.berkeley.edu/~aiken/moss.html Method To perform the experiment, homework 2 was selected from the summer quarter class of CSE143. A reasonable solution to this homework should have been under 100 lines long and yet still have contained some interesting structure. This set of homework solutions was believed to be free from any cases of cheating, based on the subjective opinion of the TA and all three cheating detection programs. Student 9712989's solution was chosen at random from those receiving perfect scores on the assignment. A fictitious student 9999999 was then created. Seven different cheating scenarios were created in which 9999999 would copy 9712989's work and then modify it in various ways to avoid detection. These seven homework sets were submitted to each of the three programs in turn, and the results were tabulated. The entire set of files (homeworks for 21 students) were submitted to moss and cheat. Due to the relatively long running time of cheatcheck, it was only used to compare the two files already known to be involved in cheating. Round 1: 9999999 submitted an exact copy of 9712989's solution. Round 2: Only the comments were changed. All declarations and statements were left intact. Round 3: The order of execution was scrambled in passages of 2 or more statements in which this change would not affect the function of the program. Round 4: Every variable and constant in the program was renamed. Round 5: The initialization of the variables was separated from the declarations. That is, statements of the form: int x; int x = 0; … were changed to: x = 0; Round 6: The program's main body consisted of a single while loop. This was changed to a do-while loop. In this particular program, this did not introduce a bug even though the consequences of this change might normally be more severe. Round 7: The conditional on all if-else statements was inverted along with the order of execution of the blocks. For example, code of the form: if (A) B; else C; was changed to if (!A) C; else B; Mechanics of Using moss Use of moss is straight forward. Upon subscribing to the service, the user is provided with a perl script which handles the creation of a zip file containing the programs that are to be examined. Options are available to submit sets of programs by either files or directories. Control over the sensitivity of moss is achieved via a command line argument giving the maximum number of times a passage may appear before it is ignored. Additionally, a "base" file may be specified in order to ignore code that we would expect to appear in many programs, such as shared library code or a skeleton file provided by the instructor. For each of the seven rounds, moss was invoked from the directory /cse/courses/cse143/97su/turnin/hw2/AA: %moss -l cc -d */main.cpp The option "-l cc" specifies the language as C++, and "-d" indicates that the submissions are by directory. Moss automatically mails the zipped up homework set to the server in Berkley. The server mails the results back to the user as soon as the analysis is complete. The format of the results is particularly useful. The results are divided into two sections. The first section identifies pairs of files with matching code, sorted by the size of the apparent match. There is one line per pair, and each line tells both the number of tokens and the number of lines that matched between the files. The second section specifies the line numbers for each pair of files that appear to be the same. It is then possible immediately to examine suspicious passages in a pair of programs and act accordingly. Below are the results which were returned by moss for round 1 of the experiment. As there were only two files of a suspicious nature, moss only reported on one pair out of the 21 files submitted. ======================================================== Section 1. Sorted by size of match. 9999999 + 9712989: tokens 336 lines 91 ======================================================== Section 2. Matching passages in programs (sorted by size of match). 9999999 + 9712989: tokens 336 lines 91 total tokens 336 + 336, total lines 129 + 129 main.cpp 39-129, main.cpp 39-129: 336 ======================================================== The turnaround time for batches of homework was remarkably speedy. For this experiment, all seven submissions were returned in under ten minutes. This experiment was performed on August 27, 1997, so part of this speed might well be due to the fact that few universities are in session, and the load on the server was correspondingly light. Results Table 1 gives the raw results of each round as reported by each of the three programs. The results are not directly comparable due to the differences in the way in which each program quantifies its findings. However, all three programs performed the best in round 1 and the worst in round 7. In figure 1, the results have been normalized to facilitate a comparison. Each program's output for round 1 (where an exact copy was submitted) is equal to 1. Only the number of tokens reported matching by moss are considered. moss (tokens) 336 336 136 336 236 150 0 Round 1 Round 2 Round 3 Round 4 Round 5 Round 6 Round 7 Cheat 68 24 24 2 2 2 2 Table 1. Cheatcheck 0 138 153 229 246 245 248 Comparison of Cheating Detection Methods 1 0.9 0.8 0.7 0.6 moss (tokens) cheat cheatcheck 0.5 0.4 0.3 0.2 0.1 0 Round 1 Round 2 Round 3 Round 4 Round 5 Figure 1 Round 6 Round 7 Discussion Round 1 is the base case in which two files are identical. As expected, all three scripts detect this, reporting a perfect match. In round 2, the change in the comments was sufficient to cause an apparent lag in the performance of both cheat and cheatcheck. The raw data, however, is somewhat misleading here. Although cheat reported far fewer matches as compared to round 1, it still reported a number of matches far in excess of those found in all other pairs of files in the homework set. This would have been sufficient to attract the attention of an instructor or TA, and this case of cheating would have been detected. The order of execution was rearranged in round 3 in places where it was not critical. All three programs showed roughly the same performance here. Note that the performance of cheat in round 2 is identical to that in round 3. Once again, cheat detected a level of similarity in these two files compared to that of other pairs in this homework set that would be sufficient to identify a case of cheating. In rounds 4, 5 and 6, the changes made in the test program were fairly minor ones. The behavior of moss is clearly superior in these case, and was the only one of the three that was able to identify these as cases of cheating. Indeed, for the case where only the variable names were changed (round 4), moss reported a perfect match just as it would have had no changes been made. Round 7 appears to break moss, as well as the other programs. It may be that since the blocks in each of the statements were small, the length of similar sequences in the two files dropped below some threshold. Conclusions Moss appears to be an excellent resource for help in identifying cases of cheating in programming assignments. It is robust enough to be able to identify potential cheating in instance that would have evaded detection using the older line-based comparisons. Although it appears to be possible to fool moss by making a sufficient number of certain types of changes to a copied program, it is not at all clear how useful it would be for moss to detect this. Due to the realities of the way in which cheating cases are prosecuted, a pair of programs often need to exhibit passages which are identical right down to white space in order for the committee to return a conviction As a plagiarized program is increasingly modified, a sort of gray area is encountered. At what point can the case no longer be considered a case of cheating? Round 1: Student 9999999 copies 9712989's homework exactly. ======================================================== Section 1. Sorted by size of match. 9999999 + 9712989: tokens 336 lines 91 ======================================================== Section 2. Matching passages in programs (sorted by size of match). 9999999 + 9712989: tokens 336 lines 91 total tokens 336 + 336, total lines 129 + 129 main.cpp 39-129, main.cpp 39-129: 336 ======================================================== Round 2: Only comments are changed. ======================================================== Section 1. Sorted by size of match. 9999999 + 9712989: tokens 336 lines 92 ======================================================== Section 2. Matching passages in programs (sorted by size of match). 9999999 + 9712989: tokens 336 lines 92 total tokens 336 + 336, total lines 136 + 129 main.cpp 45-136, main.cpp 39-129: 336 ======================================================== Round 3: Rearrange order of statements wherever possible. ======================================================== Section 1. Sorted by size of match. 9999999 + 9712989: tokens 136 lines 37 ======================================================== Section 2. Matching passages in programs (sorted by size of match). 9999999 + 9712989: tokens 136 lines 37 total tokens 336 + 336, total lines 142 + 129 main.cpp 106-142, main.cpp 93-129: 136 ======================================================== Round 4: Rename variables ======================================================== Section 1. Sorted by size of match. 9999999 + 9712989: tokens 336 lines 92 ======================================================== Section 2. Matching passages in programs (sorted by size of match). 9999999 + 9712989: tokens 336 lines 92 total tokens 336 + 336, total lines 142 + 129 main.cpp 51-142, main.cpp 39-129: 336 ======================================================== Round 5: Change initialization code ======================================================== Section 1. Sorted by size of match. 9999999 + 9712989: tokens 236 lines 65 ======================================================== Section 2. Matching passages in programs (sorted by size of match). 9999999 + 9712989: tokens 236 lines 65 total tokens 354 + 336, total lines 162 + 129 main.cpp 99-162, main.cpp 65-129: 236 ======================================================== Round 6: Change while loop to do-while loop Section 1. Sorted by size of match. ======================================================== 9999999 + 9712989: tokens 150 lines 43 ======================================================== Section 2. Matching passages in programs (sorted by size of match). 9999999 + 9712989: tokens 150 lines 43 total tokens 347 + 336, total lines 157 + 129 main.cpp 107-148, main.cpp 78-120: 150 ======================================================== Round 7: Inverted comparisons and order of if-else statements (i.e. if (A) B; else C; changed to if (!A) C; else B; ======================================================== Section 1. Sorted by size of match. ======================================================== Section 2. Matching passages in programs (sorted by size of match). ========================================================