Report on SafeAssign 2007 Summary SafeAssign is a plagiarism detection tool that is fully integrated with Blackboard. It is based on safeassignment from ‘MyDropBox.com’ a Canadian based company which has been an alternative to Turnitin North American for some time. Blackboard acquired the company and the products and now offers SafeAssign free of charge as an extra Blackboard service. Given our experience with using Turnitin and its integration with Blackboard, a trial comparing the two services was conducted. A repeat of the blind trial that was used to test Turnitin in 2004 was conducted, using the same student work. A new sample piece of material created by copying work from open internet sources was also created to test the two systems in real time side by side. Only the sample piece of material solicited matches from the internet, all other work matched internally (i.e. to student papers in the same batch of submissions) only. A major issue with migrating current staff from Turnitin to SafeAssign is the very different way that percentage scores for non-originality are calculated and interpreted. Turnitin uses a percentage matching score, which takes the best match to a source and counts the number of words that match between the student work and the source expressed as a percentage of the total number of words in the piece. These scores are collated to give a final score, so that a non-originality score of 42% means that of 100 words written by the student, 42 match to another source. However, SafeAssign uses a percentage probability score. This means that scores given for each sentence highlighted as containing matching text is a measure of the probability of these two sentences matching, the overall score given to the piece as a whole is an ‘indicator’ of what percentage matches have been found. This will lead to major confusion for staff in interpretation of these different systems. SA Results and reports General comments on reports: Pros: Simple format, clear option to re-submit for rescanning without particular sources Offers direct email link - URL for emailing to share with colleagues Once colour highlighting turned on, straightforward to see how matching text is placed (layout similar to old classic report of Turnitin) Print version is in black and white using numbers and italics to identify plagiarised passages Cons: Matching text initially highlighted all in the same colour – bright blue. To highlight individual sources in ‘highlighter’ style colour, must click on each source Report on SafeAssign 2007 No option to remove quotes or references ‘Side by side’ equivalent view limited, puts both paragraphs on top of one another in pop box, need to compare by eye, without colour or other indication to point out matching. Gross level of copying is shown initially in blue text: Can highlight sources in colour: Report on SafeAssign 2007 Note, in this case there were two sources, both highlighted in the same pale blue. The whole sentence is highlighted, regardless of the percentage matching text, so a whole sentence may be highlighted but only a few words may be matching with it. For example the whole sentence written by the student was ‘Rett Syndrome affects 1 in 15,000 liveborn girls and it is thought that between 70-95% of all cases are sporadic and so will only occur once in the family and are caused by spontaneous mutation’ and the whole sentence was marked out in blue text or highlighted. URL: /view-paperdisplay.do?paperId=1648585&inst_id=1636028 Matching: 64% Uploaded Manuscript: Rett Syndrome affects 1 in 15,000 liveborn girls and it is thought that between 70-95% of all cases are sporadic and so will only occur once in the family and are caused by spontaneous mutation Internet Source: Between 70-95% of Rett syndrome cases are sporadic therefore will only occur once in the family In fact the only matching words are: Rett Syndrome affects 1 in 15,000 liveborn girls and it is thought that between 70-95% of all cases are sporadic and so will only occur once in the family and are caused by spontaneous mutation So of the original sentence of 35 words, 15 words matched the source (not an internet source, but another student paper – this is identified in the source listing, and by the complex URL in the box). I calculate this to be 42% matching, not 64% as stated. Report on SafeAssign 2007 Another example: URL: /view-paperdisplay.do?paperId=1648585&inst_id=1636028 Matching: 64% Uploaded Manuscript: Mutations in MeCP2 are lethal in males because males only have one copy of the X chromosome so compensation cannot occur like in does in females Internet Source: Mutations in MeCP2 are lethal in males because unlike females (XX), males have X and Y hence lack a second copy of the X-chromosome that compensates for a defective one Mutations in MeCP2 are lethal in males because males only have one copy of the X chromosome so compensation cannot occur like in does in females 26 words, 17 matching = 65% Note – safeassign information on their reports states that percentage are not percentage f matching text but probability scores…. From http://wiki.safeassign.com/display/SAFE/Interpret+Reports Sentence matching scores are the percentage probability that two sentences have the same meaning. This number can also be interpreted as the reciprocal to the probability that these two sentences are similar by chance. For example, a score of 90 percent means that there is a 90 percent probability that these two sentences are the same and a 10 percent probability that they are similar by chance and not because the submitted paper includes content from the existing source (whether or not it is appropriately attributed). Overall score is an indicator of what percentage of the submitted paper matches existing sources. This score is a warning indicator only and papers should be reviewed to see if the matches are properly attributed. Scores below 15 percent: These papers typical include some quotes and few common phrases or blocks of text that match other documents. These papers typically do not require further analysis, as there is no evidence of the possibility of plagiarism in these papers. Scores between 15 percent and 40 percent: These papers include extensive quoted or paraphrased material or they may include plagiarism. These papers should be reviewed to determine if the matching content is properly attributed. Scores over 40 percent: There is a very high probability that text in this paper was copied from other sources. These papers include quoted or paraphrased text in excess and should be reviewed for plagiarism. Report on SafeAssign 2007 Test piece created to see what would be detected by both systems: Sample 1: Contained 6 paragraphs, 5 wholly copied from online sources, 1 original (written by me). The copied sources were: Source used 1 http://wiki.safeassign.com/display/SAFE/About+SafeAssign (20/8/07) 2. http://en.wikipedia.org/wiki/Dna (20/8/07) 3. http://en.wikipedia.org/wiki/Wiki (20/8/07) 4. http://news.bbc.co.uk/1/hi/entertainment/6500753.stm (story on plagiarism from March 2007) 5. http://www.guardian.co.uk/usa/story/0,,1761487,00.html (story on plagiarism from 2006) 6. original paragraph plus title Number Percentage of of total words piece (594 words) 63 11% 93 45 88 16% 8% 14% 210 35% 97 16% Sources 2-5 were likely to be referenced or re-used elsewhere, or have quotations or matching text elsewhere online. Turnitin Results from submission of sample 1: Picked up some plagiarism in all but one of the plagiarised pieces, though limited in places (overall score 35%). Got both wiki pieces correctly. Turnitin Paragraph number 1 2 Overall score Matching text found? 3% match (publications) "Blackboard Unveils Plagiarism Prevention Service; SafeAssign, Available as New Blackboard Beyond Ser", Internet Wire, July 10 2007 Issue 16% match (internet from 16/08/07) (8-16-07) http://en.wikipedia.org/wiki/Deoxyribose_nucleic_acid 3 7% match (internet from 15/07/07) (7-15-07) http://en.wikipedia.org/wiki/Wiki 4 No matches 35% All of paragraph copied was found, plus two extra words by chance from next para All except first 4 four words found Report on SafeAssign 2007 5% match (publications) "NRI teen novelist 'sorry' beyond words.", The Times of India, April 27 2006 Issue 5 2% match (publications) "Columbia U.: Harvard U. writer says copying from Columbia U. alum's book 'unintentional'.", The America's Intelligence Wire, April 25 2006 Issue Three different sources listed here, total of 1% match (internet from 13/01/07) (1-13-07) http://www.gg2.net/news/IN_fE/Indianborn+teeIN_fE_317~26_04_2006.asp 6 No matches Safeassign Each time the report is created, it seems to be done afresh. Therefore if the report is resubmitted with sources excluded, new sources are found as expected, however there is no way to then re-include the original sources again or return to the original report. Here, I removed the two original sources, which picked up a further 3 sources, then trying to re-include the first 2 sources, resubmitted again, only this time to come up with three sources. On the first run, Safeassign correctly attributed and highlighting two of five the pieces. However, of the three it missed, one was it’s own wiki and two were from wikipedia (this would be my first target for a place to crawl…). As I cannot return to the original version, these results are from the third refresh, however they represent the best overall score of all three reports. Safeassign Overall score Paragraph number 1 2 Matching text found? 3 4 None http://news.bbc.co.uk/1/hi/entertainment/6500753.stm 68% (note: on third refresh) None http://semcents.com/2007/07/05/building-blocks-of-life.aspx Picked up all sentences, listed as 96%, 97% and 100% probability of matchin Correct source, fully Report on SafeAssign 2007 5 6 picked up but noted as 82% (first sentence on source contained title), 100%, 100% , 100% for each sentence http://books.guardian.co.uk/news/articles/0,,1761488,00.html All correct, all matching text identified none Note, used the option to save at html, but when re-opened, was very difficult to read. Image files were missing and there was no highlighting. A question has been raised as to whether papers/ peer review journals can be detected by Safeassign. All testing so far suggests not. The Human Genome Paper in Science, copied form science was picked up but on another open site (eurochromatin) not the science site, as detected by Turnitin. Technical notes SA Building block installed on test server 12 July 2007, Bb version Bb 7.2.383.0 in testing.