Uploaded by ضبط الحساب

Automatic MCQ Correction

advertisement
1
AUTOMATIC SYSTEM FOR GRADING
MULTIPLE CHOICE QUESTIONS
Muaaz Habeek
Faculty of NTIC
University of Abdelhamid Mehri - Constantine 2 Algeria.
Muaaz.h.is@gmail.com
Charaf Eddine Dridi
Faculty of NTIC
University of Abdelhamid Mehri - Constantine 2, Algeria.
1100charaf@gmail.com
Mohamed Badeche
Faculty of NTIC
University of Abdelhamid Mehri -Constantine 2, Algeria.
badeche_mohamed@yahoo.fr
ABSTRACT
Although technology for automatic grading of multiplechoice exams exists, it is neither efficient nor as automatic
as it claims to be. All proposed methods have a predefined
answer sheet format that looks like a crosswords table or a
chessboard. Because of this format, all questions must
have the same number of choices. Such an answer sheet is
not clear, and candidates taking the exam can and will
accidentally mark the wrong cell in the table. Most of them
assume that there is only one possible answer for every
question. This paper proposes an algorithm that does not
require any special format, works with all scanning
resolutions and is actually fast.
KEYWORDS
Multiple Choice Questions, grading, exams.
1
INTRODUCTION
Nowadays, nearly everything is computerized, especially
dumb, dull or dangerous things, like grading Multiple
Choice Questions (MCQ) exam. Automatic MCQ grading
is a relatively young research topic. The first system had
been developed using Optical Mark Recognition (OMR)
forms coupled with OMR software and dedicated
scanners, these systems are oriented to big organizations
and universities, but small institutes and individual
teachers cannot afford these costly systems [1] [2] [3].
Automatic MCQ grading systems are based on the
extraction of response marks from scanned exam answer
sheets. Many methods like [4] and [5], impose a special
sheet format and do not support all types of MCQ such as:
conventional MCQ, alternative MCQ and complex MCQ
[6].
Some systems have test generators [7], i.e. they provide
the ability to generate the forms. This kind of systems is
often the worst because of its limitations and its strict
specifications and conditions (ex: the same special form
format for all exams and the candidate identification (id)
is written in a complex grid by checking boxes).
Many articles were published in this domain, yet the
existing software fail to deliver an easy and practical
solution because of their poor image processing
techniques.
In this paper, we present a method that aims to fix all
these issues and imposes as few restrictions as possible on
the users (candidate / examiner). This method does not
require a special sheet format, although it may require
certain additions to the answer sheet, such as a rectangular
box to contain the candidate id. The candidate id would be
written in a more natural way by writing seven-segment
digits. This method allows different number of options for
questions, for example, the first question may have two
options while the second may have twenty. These options
do not have to have squares, they can have any type of
enumeration.
2
PROPOSED SYSTEM
Unlike other programs, the paper layout is unknown, so
the user has to input a sample sheet answer and specify
its layout. The paper layout is an array of question layouts
and a question layout is a set of squares. These squares
are the small areas the candidate is supposed to color. The
paper layout also contains a reference to a rectangle that
indicates where this candidate id should be. The user can
input more than one sample sheet, for example page one
and page two of the same exam or two variation of the
same exam. Then, these samples are put through the
preparation phase and their layouts are modified
accordingly.
Next, the user must choose a grading system. The data
structure containing this information must be such that
for every question the user can choose any combination
of square states (check – not checked) and set a negative
or positive grade for it.
Finally, the users inputs the set of sheets to be graded, this
is when the main algorithm runs. It is divided into two
phases:
a) Preparation phase.
b) Parsing & grading phase.
2
2.1
Preparation phase
This is where most of the image processing takes place. It
takes the scanned image as input and returns as output a
processed version of the input. This image has a predefined
size called the uniform size. The algorithm that generates
it is as follows:

Convert the input image to grayscale then resize it
to the uniform size using bilinear interpolation.

Converting the image to binary.

Apply noise removal filter.

If rotation correction is on, detect the best angle
then rotate the image using bilinear interpolation.

Move the image to the top left (remove all
horizontal white line on the top & all vertical white
lines on the left).

Generate a thumbnail by resizing the image using
nearest-neighbor source-to-destination.

If it the input image is a sample itself, halt.

Compare the thumbnail to all samples.

If upside-down check is on, turn it upside-down
and move it to top left again, then generate another
thumbnail and compare it to all samples.
then start again from the angle which had the best score
in the last test and reduce both the range and the step until
we are satisfied with the accuracy.
The score for an angle α is calculated by rotating the
image by α, then counting the number of uninterrupted
horizontal white lines. This method is not perfect and may
give false results, causing the image to rotate by a random
angle.
2.1.2 Comparing thumbnails:
To compare thumbnails, we divide the thumbnail images
to 8 by 8 squares then compare these squares and calculate
the percentage of the match. If the squares have very
different number of black pixels they are considered
different. If the numbers are relatively close, most of the
black pixels on one of the images must match their peers
in the other one for them to be considered matching.
The final result is the number of matching square over the
total number of squares.
2.2
Parsing & grading phase
From the previous phase, we obtain a processed binary
image and we know to which answer keys (sample) it
corresponds. Since all images are processed in the same
way in phase one, the layout of the answer keys should be
the same for all matching images. If there was a rotation
during scanning, rotation correction should fix it, and
moving the image’s content to the top-left corner means
that the top-left corners of the images will match and
therefore the rest of the image will match.
If only one match is found, proceed to next phase. Else
halt and report error.
Figure 1: Phase one diagram.
2.1.1
Detecting the angle of a sheet:
Detecting the angle of a sheet means finding the angle α
by which the paper was rotated inside the scanner. To find
α, we perform a progressive search in a small range, for
example [-10°, 10°]. We start with a large step (ex: 5°)
Figure 2: visualization of a question layout.
Student answers are read form the binary images using the
layouts. A square is considered checked if over 50% of its
3
area is black. Once that’s done, the image is cropped so
that only the candidate id rectangle remains.
To grade the paper, the answers of the student are
compared to the answer keys, for every question if the
combination of the squares read from the student’s sheet
matches a combination in the answer keys for that question
then the total grade is incremented by the value of that
combination. If all squares are white, meaning the
candidate skipped the question, and gets the no-answer
penalty. And if no match is found for the candidate’s
answer, he get the wrong-answer penalty.
Once the total grade is calculated, the candidate id must be
parsed. First the border of the rectangle is removed along
with anything connected to it and the image is cropped so
there are not horizontal or vertical white lines on the edges.
Next, the images is divided to lines ignoring white spaces,
it can be one line or more.
Figure 6: Example of identifying 7segment digits and
locating the six points on the digit.
Now the goal is to determent which of the seven
segments does this digit have, to do that we use these two
rules:
a) The segment does not exist of one of its point is
missing.
b) The segment exists, if the distance between its two
points slightly less or equal to the path between
them.
Figure 3: Example of candidate id.
Every line is then divided to columns, this will result in
small images each containing a digit, and white spaces are
ignored again.
Figure 7: Determining the existence of segments.
The final step obtain digits form segments, and then
concatenate the digits from left to right starting from the
top line.
3
Figure 4: Example of candidate id divided into two lines.
Figure 5: Example of candidate id divided into digits.
After that, every digits is cropped and parsed on its own.
To confirm that it is a seven segment digit, we look for two
white squares, one on the top half, and the other on the
bottom half, failing to locate either means it’s not a valid
digit. Next, we look for six points in the resulting image,
four in the coroners and two in the center right and center
left (Figure 6).
3.1
RESULTS AND DISCUSSION
Detecting the angle of a sheet
We tested this method on a dataset of 503 image of MCQ
tests taken form random pdf files that we downloaded
from the internet. We rotated every image from -8° to +8°
with a step of 0.05° then ran the angle detection and
calculated the error (161 test per sample, total tests 80983).
A test fails if the error is greater than 1° and a sample fails
if it fails one of its 161 tests. The samples that failed the
test had one thing in common: most of the paper is white,
or the paper is mostly shapes and figures, meaning these
samples did not have long lines of text. One solution to this
problem is to add three or more thick horizontal lines
anywhere on the paper (top, bottom or middle). Another
solution is to place the papers properly in the scanner and
avoid the angle correction completely.
4
we should have 493 match and we did. Our
implementation of the compare method returns values
from 10% to 99%.
Figure 8: Distribution of angle detection errors.
Figure 10: Positive comparison test results (total number
493 match).
Figure 9: Angle detection test result.
3.2
Comparing thumbnails
We tested this method on the same dataset of 503 images,
all the images were unique and some of them were a bit
similar. First, thumbnails for the dataset were generated.
Then every sheet in the sample was rotated randomly and
random noise was added to it, then it was put through
phase one, and its thumbnail was generated then compared
to all thumbnails of the dataset. Rotating the sheets and
%
10%
20
30
40
50
60
70
to
%
%
%
%
%
%
19%
to
to
to
to
to
29
39
49
59
69
%
%
%
%
%
No 11464 548 958 150
32
5
1
6
6
.
adding noise was a simulation to what may happen to the
sheet during copying and scanning, some of the sheets
form the rotation test could not be rotated back to normal,
so they were ignored in this test. Only 493 sheet passed,
and the total number of comparisons was 123549, every
sheet is supposed to match the original form of itself only,
Figure 11: Negative comparison test results.
Only six negative test results are above 60%, we can
determine that if two images are the same they should have
a match percent above 75%.
This method is good enough since we will prevent the user
from adding matching samples.
5
3.3
Performance
For the performance, we tested our program on a laptop
with the following specifications:
- Intel (R) Core(TM) i3-3110M CPU @ 2.40GHZ
- RAM 8.00 GB
- 64-bit Windows 8.1, x64-based processor.
The angle detection takes most of the processing time, its
average is 202ms. Grading an answer sheet takes 62ms
with angle detection turned off. This can be further
improved by separating the I/O form the processing, by
creating a thread to handle the I/O operations, and another
to handle the main work.
This program can grade an average of 229 sheet per
minute, over one thousand sheet every five minute, and
over 12000 sheet every hour while other programs grade
12000 per day [4].
4
[3] Hendriks, R. (2012) Automatic exam correction. UVA Universiteit
van Amsterdam. [Online]. Available from:
https://esc.fnwi.uva.nl/thesis/centraal/files/f438164865.pdf (Accessed
23 April 2017).
[4] Fisteus, J. A. et al. (2013) Grading multiple choice exams with lowcost and portable computer-vision techniques. Journal of Science
Education and Technology. 22 (4), 560–571.
[5] Tavana, A. M. et al. (2016) ‘Optimizing the correction of MCQ test
answer sheets using digital image processing’, in Information and
Knowledge Technology (IKT), 2016 Eighth International Conference
on. 2016 IEEE. pp. 139–143.
[6] Thérèse, B. & Warnier, L. (2016) Evaluer les acquis des étudiants à
l’aide de QCM. [online]. Available from:
https://www.uclouvain.be/cps/ucl/doc/ipm/documents/VADEMECU
M_Mars_2016.pdf (Accessed 23 April 2017).
[7] F. de Assis Zampirolli, V. R. Batista, et J. A. Quilici-Gonzalez, « An
automatic generator and corrector of multiple choice tests with
random answer keys », in Frontiers in Education Conference (FIE),
2016 IEEE, 2016, p. 1–8.
CONCLUSIONS
The algorithm can be implemented in any programming
language and does not require any specific third-party
libraries. All image processing must be implemented by
hand to ensure performance and abstraction (i.e. methods
must not have extra feature that may slow performance).
This method can be further improved, parsing sevensegments digits can be extended to include characters, the
rotation correction can be improved by finding other
functions of evaluation and The processing speed can be
improved by multithreading.
REFERENCES
[1] Rakesh, S. et al. (2013) Cost effective optical mark reader.
International Journal of Computer Science and Artificial Intelligence.
3 (2), 44.
[2] Chidrewar, V. and al. (n.d.) Mobile Based Auto Grading Of
Answersheets. [Online]. Available from:
https://pdfs.semanticscholar.org/00b9/8d6eb85b50b5f172a94660c826
39a829d3bb.pdf (Accessed 23 April 2017).
6
Download