Automatic Plagiarism detection Charlie Daly Jane Horgan Dublin City University.

advertisement
Automatic Plagiarism detection
Charlie Daly
Jane Horgan
Dublin City University.
Overview
• Context
• How it works (overview)
• Comparison with other plagiarism detection
systems
• How it works (details)
– Marks the original (with a watermark)
– Invisibly
• Results
Context
• It is not
– catching people breaking copyright
– detecting plagiarism in essays etc.
• It only works for programs, specifically
when students submit a program for an
assingment.
• Plagiarism is a huge problem on many
programming courses.
Why are the so many systems?
• Lecturers who are also programmers get
upset when they see their students copying
their assignements.
– It is seen as an affront
– So they write a program.
• 'Efficiency' in education
=> large classes sizes
=> manual detection is difficult.
So why another system?
• All previous systems use pair-wise
comparison. Individual programs are
compared against the other programs.
• This means
– they are programming-language specific
– they don't work across years.
– they cannot identify the original author.
So how does our technique work?
• When a student submits a program, the
program is marked with a watermark
indicating the author.
• If the student subsequently gives an
electronic copy of the program to another
student, then the watermark will be
recognised by the system as soon as it is
submitted.
But ...
• Need to be able to modify the original
student's file
• The watermark needs to be invisible to the
student.
The process
Program
Here's the student
program
Stored on a hard disk
Watermark
The student submits
the program
The watermark is
added.
Hard Disk
On the student's
own hard disk!
Compared to previous systems
+ Can detect plagiarism as soon as submitted
+ Identifies the author
+ Programming-language independent
+ Works with tiny programs
- Only works with an electronic copy
- Easy to bypass if students know about it
- Plagiarising student must get a copy after it
has been submitted bu the author
RoboProf provides infrastructure
• RoboProf is a learning environment.
• Automatically sets and marks simple
assignments.
• The Student submits a program, which is
compiled and run on the student's machine.
– an applet with read-write access is used to
manage the compilation and marking.
• The program output is then sent to the
server for marking.
RoboProf
The student
a
Thewrites
Student
program and
logs on it
submits
TheAssignment
program
and
Results
are returned
output
arestudent
sent to the
to
the
Specification
server for marking
Server
An applet
compiles and
runs the program
locally
Browser
Part 1: modifying the student
Program
• Now that an applet can write to the student's
disk, it can modify the student's file (to add
the watermark).
• Only problem remaining ... how do we
implement the watermark.
The Watermark
• Needs to be invisible to the student.
• Needs to encode
– the student ID
– the year
The Watermark
• use 10 binary digits for the student ID, =>
can distinguish 1024 students.
• use 4 binary digits for the year.
• Also use an ID for the assignment and
record which attempt it is (RoboProf allows
students to resubmit a program to improve
the mark).
• Checksum (4 digits)
The watermark
• The binary code requires 34 bits
(10+4+10+6+4).
• This code is written directly onto the file.
Student ID
Year
0000101010 0001 000010110 000111 0000
#include <stdio.h>
main()
{
}
Making it invisible
A space is used to represent the binary digit 0
and a tab is used to represent the binary digit 1.
0000101010
0000101010 0001 000010110 000111
#include <stdio.h>
main()
{
}
Making it invisible
A space is used to represent the binary digit 0
and a tab is used to represent the binary digit 1.
0000101010
becomes

space
tab
invisible!
Results
• We used the plagiarism detector as part of
RoboProf on a group of students (283). There
were two main parts to the course, continuous
assessment and a programming exam.
• The continuous assessment was to be done in
the students' own time and was subject to
plagiarism whereas, the programming exam
was supervised.
Results
• We compared the exam results of those who
plagiarised (40%) with those who didn't
• The results are unsurprising: plagiarists
performed less well in the exam. And the more
they plagiarised, the worse they performed.
• Also plagiarists submitted their continuous
assessment on average a week later than their
honest peers.
Frequency
Incidence of plagiarism
Number copied
Exam mark
Exam Results
Number copied
Completion date
copied
original
The end
Questions
• What happens if a program is submitted
which already contains a watermark?
• It can happen legitimately if a student
resubmits a program
• So the watermark is checked against the
submitter's ID, and if they don't match the
lecturer is emailed and investigates further.
• Then the watermark is overwritten => can
detect chains of plagiarism.
Question Eile
• Why did you only monitor plagiarism; why
not take any action?
• There are three answers:
– Resources: The university has machinery in
place to deal with plagiarism. It is very
bureaucratic and soaks up time.
– Some students accidentally committed
plagiarism; testing the system.
– Need corroborating evidence; can't let the trick
be known.
Question 3
• "But won't the watermark that is sent to the
server have been just created by the system?
It'll just read the watermark it generated."
• No. It reads the program and then doctors it.
The server gets the unadultered program,
the student is left with the modified
program.
Question 4
• Any problems in practice?
• Yes, a modern IDE can detect when the
source has been modified and askes if you
wish to reload the buffer. Hasn't been fixed
yet.
• You need to correctly set up applet security.
• A student may save the file after it has been
modified (clean version still in the editor).
How much is original
• Inserting a watermark unknown to the user
(as far as I know).
• Using unseen whitespace has been used to
detect copyright infringment (it was
unknowingly inserted by the author).
Download