Report on SafeAssign

advertisement
Report on SafeAssign 2007
Summary
SafeAssign is a plagiarism detection tool that is fully integrated with
Blackboard. It is based on safeassignment from ‘MyDropBox.com’ a Canadian
based company which has been an alternative to Turnitin North American for
some time. Blackboard acquired the company and the products and now
offers SafeAssign free of charge as an extra Blackboard service.
Given our experience with using Turnitin and its integration with Blackboard, a
trial comparing the two services was conducted. A repeat of the blind trial that
was used to test Turnitin in 2004 was conducted, using the same student
work. A new sample piece of material created by copying work from open
internet sources was also created to test the two systems in real time side by
side.
Only the sample piece of material solicited matches from the internet, all other
work matched internally (i.e. to student papers in the same batch of
submissions) only.
A major issue with migrating current staff from Turnitin to SafeAssign is the
very different way that percentage scores for non-originality are calculated
and interpreted. Turnitin uses a percentage matching score, which takes the
best match to a source and counts the number of words that match between
the student work and the source expressed as a percentage of the total
number of words in the piece. These scores are collated to give a final score,
so that a non-originality score of 42% means that of 100 words written by the
student, 42 match to another source. However, SafeAssign uses a
percentage probability score. This means that scores given for each sentence
highlighted as containing matching text is a measure of the probability of
these two sentences matching, the overall score given to the piece as a whole
is an ‘indicator’ of what percentage matches have been found. This will lead to
major confusion for staff in interpretation of these different systems.
SA Results and reports
General comments on reports:
Pros:
 Simple format, clear option to re-submit for rescanning without
particular sources
 Offers direct email link - URL for emailing to share with colleagues
 Once colour highlighting turned on, straightforward to see how
matching text is placed (layout similar to old classic report of Turnitin)
 Print version is in black and white using numbers and italics to
identify plagiarised passages
Cons:
 Matching text initially highlighted all in the same colour – bright blue.
To highlight individual sources in ‘highlighter’ style colour, must click
on each source
Report on SafeAssign 2007
 No option to remove quotes or references
 ‘Side by side’ equivalent view limited, puts both paragraphs on top of
one another in pop box, need to compare by eye, without colour or
other indication to point out matching.
Gross level of copying is shown initially in blue text:
Can highlight sources in colour:
Report on SafeAssign 2007
Note, in this case there were two sources, both highlighted in the same pale
blue.
The whole sentence is highlighted, regardless of the percentage matching
text, so a whole sentence may be highlighted but only a few words may be
matching with it.
For example the whole sentence written by the student was
‘Rett Syndrome affects 1 in 15,000 liveborn girls and it is thought that between
70-95% of all cases are sporadic and so will only occur once in the family and are
caused by spontaneous mutation’ and the whole sentence was marked out in blue
text or highlighted.
URL:
/view-paperdisplay.do?paperId=1648585&inst_id=1636028
Matching: 64%
Uploaded Manuscript:
Rett Syndrome affects 1 in 15,000 liveborn girls and it is
thought that between 70-95% of all cases are sporadic and
so will only occur once in the family and are caused by
spontaneous mutation
Internet Source:
Between 70-95% of Rett syndrome cases are sporadic
therefore will only occur once in the family
In fact the only matching words are:
Rett Syndrome affects 1 in 15,000 liveborn girls and it is thought that between
70-95% of all cases are sporadic and so will only occur once in the family and are
caused by spontaneous mutation
So of the original sentence of 35 words, 15 words matched the source (not an
internet source, but another student paper – this is identified in the source
listing, and by the complex URL in the box). I calculate this to be 42%
matching, not 64% as stated.
Report on SafeAssign 2007
Another example:
URL:
/view-paperdisplay.do?paperId=1648585&inst_id=1636028
Matching: 64%
Uploaded Manuscript: Mutations in MeCP2 are lethal in males because males only
have one copy of the X chromosome so compensation
cannot occur like in does in females
Internet Source:
Mutations in MeCP2 are lethal in males because unlike
females (XX), males have X and Y hence lack a second
copy of the X-chromosome that compensates for a
defective one
Mutations in MeCP2 are lethal in males because males only have one copy of the
X chromosome so compensation cannot occur like in does in females
26 words, 17 matching = 65%
Note – safeassign information on their reports states that percentage are not
percentage f matching text but probability scores…. From
http://wiki.safeassign.com/display/SAFE/Interpret+Reports
Sentence matching scores are the percentage probability that two sentences have the same
meaning.
This number can also be interpreted as the reciprocal to the probability that these two
sentences are similar by chance. For example, a score of 90 percent means that there is a 90
percent probability that these two sentences are the same and a 10 percent probability that
they are similar by chance and not because the submitted paper includes content from the
existing source (whether or not it is appropriately attributed).
Overall score is an indicator of what percentage of the submitted paper matches existing
sources. This score is a warning indicator only and papers should be reviewed to see if the
matches are properly attributed.

Scores below 15 percent: These papers typical include some quotes and few
common phrases or blocks of text that match other documents. These papers typically
do not require further analysis, as there is no evidence of the possibility of plagiarism in
these papers.

Scores between 15 percent and 40 percent: These papers include extensive quoted
or paraphrased material or they may include plagiarism. These papers should be
reviewed to determine if the matching content is properly attributed.

Scores over 40 percent: There is a very high probability that text in this paper was
copied from other sources. These papers include quoted or paraphrased text in excess
and should be reviewed for plagiarism.
Report on SafeAssign 2007
Test piece created to see what would be detected by both
systems:
Sample 1: Contained 6 paragraphs, 5 wholly copied from online sources, 1 original (written by
me). The copied sources were:
Source used
1
http://wiki.safeassign.com/display/SAFE/About+SafeAssign
(20/8/07)
2. http://en.wikipedia.org/wiki/Dna (20/8/07)
3. http://en.wikipedia.org/wiki/Wiki (20/8/07)
4. http://news.bbc.co.uk/1/hi/entertainment/6500753.stm
(story on plagiarism from March 2007)
5. http://www.guardian.co.uk/usa/story/0,,1761487,00.html
(story on plagiarism from 2006)
6. original paragraph plus title
Number
Percentage
of
of total
words
piece (594
words)
63
11%
93
45
88
16%
8%
14%
210
35%
97
16%
Sources 2-5 were likely to be referenced or re-used elsewhere, or have
quotations or matching text elsewhere online.
Turnitin
Results from submission of sample 1: Picked up some plagiarism in all but
one of the plagiarised pieces, though limited in places (overall score 35%).
Got both wiki pieces correctly.
Turnitin
Paragraph
number
1
2
Overall score
Matching text found?
3% match (publications)
"Blackboard Unveils Plagiarism Prevention Service;
SafeAssign, Available as New Blackboard Beyond
Ser", Internet Wire, July 10 2007 Issue
16% match (internet from 16/08/07)
(8-16-07)
http://en.wikipedia.org/wiki/Deoxyribose_nucleic_acid
3
7% match (internet from 15/07/07)
(7-15-07) http://en.wikipedia.org/wiki/Wiki
4
No matches
35%
All of
paragraph
copied was
found, plus
two extra
words by
chance from
next para
All except
first 4 four
words found
Report on SafeAssign 2007
5% match (publications)
"NRI teen novelist 'sorry' beyond words.", The Times
of India, April 27 2006 Issue
5
2% match (publications)
"Columbia U.: Harvard U. writer says copying from
Columbia U. alum's book 'unintentional'.", The
America's Intelligence Wire, April 25 2006 Issue
Three
different
sources
listed here,
total of
1% match (internet from 13/01/07)
(1-13-07) http://www.gg2.net/news/IN_fE/Indianborn+teeIN_fE_317~26_04_2006.asp
6
No matches
Safeassign
Each time the report is created, it seems to be done afresh. Therefore if the
report is resubmitted with sources excluded, new sources are found as
expected, however there is no way to then re-include the original sources
again or return to the original report. Here, I removed the two original
sources, which picked up a further 3 sources, then trying to re-include the first
2 sources, resubmitted again, only this time to come up with three sources.
On the first run, Safeassign correctly attributed and highlighting two of five the
pieces. However, of the three it missed, one was it’s own wiki and two were
from wikipedia (this would be my first target for a place to crawl…).
As I cannot return to the original version, these results are from the third
refresh, however they represent the best overall score of all three reports.
Safeassign Overall score
Paragraph
number
1
2
Matching text found?
3
4
None
http://news.bbc.co.uk/1/hi/entertainment/6500753.stm
68% (note:
on third
refresh)
None
http://semcents.com/2007/07/05/building-blocks-of-life.aspx
Picked up
all
sentences,
listed as
96%, 97%
and 100%
probability
of matchin
Correct
source,
fully
Report on SafeAssign 2007
5
6
picked up
but noted
as 82%
(first
sentence
on source
contained
title),
100%,
100% ,
100% for
each
sentence
http://books.guardian.co.uk/news/articles/0,,1761488,00.html All correct,
all
matching
text
identified
none
Note, used the option to save at html, but when re-opened, was very difficult
to read. Image files were missing and there was no highlighting.
A question has been raised as to whether papers/ peer review journals can be
detected by Safeassign. All testing so far suggests not. The Human Genome
Paper in Science, copied form science was picked up but on another open
site (eurochromatin) not the science site, as detected by Turnitin.
Technical notes
SA Building block installed on test server 12 July 2007, Bb version Bb
7.2.383.0 in testing.
Download