1 AUTOMATIC SYSTEM FOR GRADING MULTIPLE CHOICE QUESTIONS Muaaz Habeek Faculty of NTIC University of Abdelhamid Mehri - Constantine 2 Algeria. Muaaz.h.is@gmail.com Charaf Eddine Dridi Faculty of NTIC University of Abdelhamid Mehri - Constantine 2, Algeria. 1100charaf@gmail.com Mohamed Badeche Faculty of NTIC University of Abdelhamid Mehri -Constantine 2, Algeria. badeche_mohamed@yahoo.fr ABSTRACT Although technology for automatic grading of multiplechoice exams exists, it is neither efficient nor as automatic as it claims to be. All proposed methods have a predefined answer sheet format that looks like a crosswords table or a chessboard. Because of this format, all questions must have the same number of choices. Such an answer sheet is not clear, and candidates taking the exam can and will accidentally mark the wrong cell in the table. Most of them assume that there is only one possible answer for every question. This paper proposes an algorithm that does not require any special format, works with all scanning resolutions and is actually fast. KEYWORDS Multiple Choice Questions, grading, exams. 1 INTRODUCTION Nowadays, nearly everything is computerized, especially dumb, dull or dangerous things, like grading Multiple Choice Questions (MCQ) exam. Automatic MCQ grading is a relatively young research topic. The first system had been developed using Optical Mark Recognition (OMR) forms coupled with OMR software and dedicated scanners, these systems are oriented to big organizations and universities, but small institutes and individual teachers cannot afford these costly systems [1] [2] [3]. Automatic MCQ grading systems are based on the extraction of response marks from scanned exam answer sheets. Many methods like [4] and [5], impose a special sheet format and do not support all types of MCQ such as: conventional MCQ, alternative MCQ and complex MCQ [6]. Some systems have test generators [7], i.e. they provide the ability to generate the forms. This kind of systems is often the worst because of its limitations and its strict specifications and conditions (ex: the same special form format for all exams and the candidate identification (id) is written in a complex grid by checking boxes). Many articles were published in this domain, yet the existing software fail to deliver an easy and practical solution because of their poor image processing techniques. In this paper, we present a method that aims to fix all these issues and imposes as few restrictions as possible on the users (candidate / examiner). This method does not require a special sheet format, although it may require certain additions to the answer sheet, such as a rectangular box to contain the candidate id. The candidate id would be written in a more natural way by writing seven-segment digits. This method allows different number of options for questions, for example, the first question may have two options while the second may have twenty. These options do not have to have squares, they can have any type of enumeration. 2 PROPOSED SYSTEM Unlike other programs, the paper layout is unknown, so the user has to input a sample sheet answer and specify its layout. The paper layout is an array of question layouts and a question layout is a set of squares. These squares are the small areas the candidate is supposed to color. The paper layout also contains a reference to a rectangle that indicates where this candidate id should be. The user can input more than one sample sheet, for example page one and page two of the same exam or two variation of the same exam. Then, these samples are put through the preparation phase and their layouts are modified accordingly. Next, the user must choose a grading system. The data structure containing this information must be such that for every question the user can choose any combination of square states (check – not checked) and set a negative or positive grade for it. Finally, the users inputs the set of sheets to be graded, this is when the main algorithm runs. It is divided into two phases: a) Preparation phase. b) Parsing & grading phase. 2 2.1 Preparation phase This is where most of the image processing takes place. It takes the scanned image as input and returns as output a processed version of the input. This image has a predefined size called the uniform size. The algorithm that generates it is as follows: Convert the input image to grayscale then resize it to the uniform size using bilinear interpolation. Converting the image to binary. Apply noise removal filter. If rotation correction is on, detect the best angle then rotate the image using bilinear interpolation. Move the image to the top left (remove all horizontal white line on the top & all vertical white lines on the left). Generate a thumbnail by resizing the image using nearest-neighbor source-to-destination. If it the input image is a sample itself, halt. Compare the thumbnail to all samples. If upside-down check is on, turn it upside-down and move it to top left again, then generate another thumbnail and compare it to all samples. then start again from the angle which had the best score in the last test and reduce both the range and the step until we are satisfied with the accuracy. The score for an angle α is calculated by rotating the image by α, then counting the number of uninterrupted horizontal white lines. This method is not perfect and may give false results, causing the image to rotate by a random angle. 2.1.2 Comparing thumbnails: To compare thumbnails, we divide the thumbnail images to 8 by 8 squares then compare these squares and calculate the percentage of the match. If the squares have very different number of black pixels they are considered different. If the numbers are relatively close, most of the black pixels on one of the images must match their peers in the other one for them to be considered matching. The final result is the number of matching square over the total number of squares. 2.2 Parsing & grading phase From the previous phase, we obtain a processed binary image and we know to which answer keys (sample) it corresponds. Since all images are processed in the same way in phase one, the layout of the answer keys should be the same for all matching images. If there was a rotation during scanning, rotation correction should fix it, and moving the image’s content to the top-left corner means that the top-left corners of the images will match and therefore the rest of the image will match. If only one match is found, proceed to next phase. Else halt and report error. Figure 1: Phase one diagram. 2.1.1 Detecting the angle of a sheet: Detecting the angle of a sheet means finding the angle α by which the paper was rotated inside the scanner. To find α, we perform a progressive search in a small range, for example [-10°, 10°]. We start with a large step (ex: 5°) Figure 2: visualization of a question layout. Student answers are read form the binary images using the layouts. A square is considered checked if over 50% of its 3 area is black. Once that’s done, the image is cropped so that only the candidate id rectangle remains. To grade the paper, the answers of the student are compared to the answer keys, for every question if the combination of the squares read from the student’s sheet matches a combination in the answer keys for that question then the total grade is incremented by the value of that combination. If all squares are white, meaning the candidate skipped the question, and gets the no-answer penalty. And if no match is found for the candidate’s answer, he get the wrong-answer penalty. Once the total grade is calculated, the candidate id must be parsed. First the border of the rectangle is removed along with anything connected to it and the image is cropped so there are not horizontal or vertical white lines on the edges. Next, the images is divided to lines ignoring white spaces, it can be one line or more. Figure 6: Example of identifying 7segment digits and locating the six points on the digit. Now the goal is to determent which of the seven segments does this digit have, to do that we use these two rules: a) The segment does not exist of one of its point is missing. b) The segment exists, if the distance between its two points slightly less or equal to the path between them. Figure 3: Example of candidate id. Every line is then divided to columns, this will result in small images each containing a digit, and white spaces are ignored again. Figure 7: Determining the existence of segments. The final step obtain digits form segments, and then concatenate the digits from left to right starting from the top line. 3 Figure 4: Example of candidate id divided into two lines. Figure 5: Example of candidate id divided into digits. After that, every digits is cropped and parsed on its own. To confirm that it is a seven segment digit, we look for two white squares, one on the top half, and the other on the bottom half, failing to locate either means it’s not a valid digit. Next, we look for six points in the resulting image, four in the coroners and two in the center right and center left (Figure 6). 3.1 RESULTS AND DISCUSSION Detecting the angle of a sheet We tested this method on a dataset of 503 image of MCQ tests taken form random pdf files that we downloaded from the internet. We rotated every image from -8° to +8° with a step of 0.05° then ran the angle detection and calculated the error (161 test per sample, total tests 80983). A test fails if the error is greater than 1° and a sample fails if it fails one of its 161 tests. The samples that failed the test had one thing in common: most of the paper is white, or the paper is mostly shapes and figures, meaning these samples did not have long lines of text. One solution to this problem is to add three or more thick horizontal lines anywhere on the paper (top, bottom or middle). Another solution is to place the papers properly in the scanner and avoid the angle correction completely. 4 we should have 493 match and we did. Our implementation of the compare method returns values from 10% to 99%. Figure 8: Distribution of angle detection errors. Figure 10: Positive comparison test results (total number 493 match). Figure 9: Angle detection test result. 3.2 Comparing thumbnails We tested this method on the same dataset of 503 images, all the images were unique and some of them were a bit similar. First, thumbnails for the dataset were generated. Then every sheet in the sample was rotated randomly and random noise was added to it, then it was put through phase one, and its thumbnail was generated then compared to all thumbnails of the dataset. Rotating the sheets and % 10% 20 30 40 50 60 70 to % % % % % % 19% to to to to to 29 39 49 59 69 % % % % % No 11464 548 958 150 32 5 1 6 6 . adding noise was a simulation to what may happen to the sheet during copying and scanning, some of the sheets form the rotation test could not be rotated back to normal, so they were ignored in this test. Only 493 sheet passed, and the total number of comparisons was 123549, every sheet is supposed to match the original form of itself only, Figure 11: Negative comparison test results. Only six negative test results are above 60%, we can determine that if two images are the same they should have a match percent above 75%. This method is good enough since we will prevent the user from adding matching samples. 5 3.3 Performance For the performance, we tested our program on a laptop with the following specifications: - Intel (R) Core(TM) i3-3110M CPU @ 2.40GHZ - RAM 8.00 GB - 64-bit Windows 8.1, x64-based processor. The angle detection takes most of the processing time, its average is 202ms. Grading an answer sheet takes 62ms with angle detection turned off. This can be further improved by separating the I/O form the processing, by creating a thread to handle the I/O operations, and another to handle the main work. This program can grade an average of 229 sheet per minute, over one thousand sheet every five minute, and over 12000 sheet every hour while other programs grade 12000 per day [4]. 4 [3] Hendriks, R. (2012) Automatic exam correction. UVA Universiteit van Amsterdam. [Online]. Available from: https://esc.fnwi.uva.nl/thesis/centraal/files/f438164865.pdf (Accessed 23 April 2017). [4] Fisteus, J. A. et al. (2013) Grading multiple choice exams with lowcost and portable computer-vision techniques. Journal of Science Education and Technology. 22 (4), 560–571. [5] Tavana, A. M. et al. (2016) ‘Optimizing the correction of MCQ test answer sheets using digital image processing’, in Information and Knowledge Technology (IKT), 2016 Eighth International Conference on. 2016 IEEE. pp. 139–143. [6] Thérèse, B. & Warnier, L. (2016) Evaluer les acquis des étudiants à l’aide de QCM. [online]. Available from: https://www.uclouvain.be/cps/ucl/doc/ipm/documents/VADEMECU M_Mars_2016.pdf (Accessed 23 April 2017). [7] F. de Assis Zampirolli, V. R. Batista, et J. A. Quilici-Gonzalez, « An automatic generator and corrector of multiple choice tests with random answer keys », in Frontiers in Education Conference (FIE), 2016 IEEE, 2016, p. 1–8. CONCLUSIONS The algorithm can be implemented in any programming language and does not require any specific third-party libraries. All image processing must be implemented by hand to ensure performance and abstraction (i.e. methods must not have extra feature that may slow performance). This method can be further improved, parsing sevensegments digits can be extended to include characters, the rotation correction can be improved by finding other functions of evaluation and The processing speed can be improved by multithreading. REFERENCES [1] Rakesh, S. et al. (2013) Cost effective optical mark reader. International Journal of Computer Science and Artificial Intelligence. 3 (2), 44. [2] Chidrewar, V. and al. (n.d.) Mobile Based Auto Grading Of Answersheets. [Online]. Available from: https://pdfs.semanticscholar.org/00b9/8d6eb85b50b5f172a94660c826 39a829d3bb.pdf (Accessed 23 April 2017). 6