24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Open in app Sign up Sign In You have 1 free member-only story left this month. Sign up for Medium and get an extra one. Member-only story Computer Vision — Auto grading Handwritten Mathematical Answersheets Computer vision model to automatically correct and grade mathematical worksheets using python. Divyaprabha M · Follow Published in Towards Data Science 8 min read · Dec 27, 2019 Listen Share Photo by Jeswin Thomas on Unsplash https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 1/22 24/05/2023, 18:27 G Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science rading is an essential part of education. Assessing each answer sheet manually, offering fair, unbiased and valid grade is difficult most of the time. This article is about my internship project with my mentor Bijon Guha on building a computer vision model that will automatically evaluate the answer sheets thereby ensuring that the grades are based solely on the student’s performance. Overview Below are the sample worksheets we are going to examine and grade. Sample Worksheets Each of these worksheets was written by different people. There will be variations in line width, character style in the same page, pen nib width and character spacing, etc. The idea is to correct each line in the worksheet and mark the lines with boxes. Where the green box represents that the line is correct and the red box represents the line is in-correct. https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 2/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Sample Output ( Note: I have not added all the code here, if you want to check you can visit my GitHub where I have the tutorial in a ipynb notebook ) Workflow Workflow diagram There are two modules in the workflow Workspace detection module and Analysis Module. Workspace Detection module is responsible for detecting multiple work spaces in a given sheet of paper. Analysis module is responsible for detecting and localizing characters in lines in any given single workspace, and mathematically analyzing them and then drawing red, green boxes depending upon their correctness. Workspace Detection https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 3/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Workspace detection module assumes that there are valid rectangular boxes in the given scanned worksheet. This image below shows the worksheet design. Three largest rectangular boxes in this worksheet are the work-spaces. Worsheet design Workspace detection module is done using openCV. We will first find the rectangular boxes, then sort them based on their positions in the worksheet. Since there are many rectangles in the worksheet, we will have to select the valid workspaces among the other rectangles. Let’s see how each step is done Step 1: Finding Rectangular Boxes Rectangles are formed by two horizontal and vertical lines. So the first step is to find all the horizontal and vertical lines ignoring digits, symbols or anything that is written on the worksheet. This code below will first create a binary image called “vertical_lines_img” which contains all the vertical lines that are present in the worksheet, then another binary image called “horizontal_lines_img” which contains all horizontal lines that are present in the worksheet. https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 4/22 24/05/2023, 18:27 1 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science image_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 2 3 #Otsu thresholding 4 thresh, binary_image = cv2.threshold(image_gray, 250, 255, cv2.THRESH_BINARY_INV) 5 6 # Defining a kernel length 7 kernel_length = np.array(binary_image).shape[1]//50 8 9 verticle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, kernel_length)) 10 hori_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_length, 1)) 11 kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) 12 13 # Morphological operation to detect vertical lines from an image 14 img_temp1 = cv2.erode(binary_image, verticle_kernel, iterations=3) 15 verticle_lines_img = cv2.dilate(img_temp1, verticle_kernel, iterations=3) 16 17 # Morphological operation to detect horizontal lines from an image 18 img_temp2 = cv2.erode(binary_image, hori_kernel, iterations=3) 19 horizontal_lines_img = cv2.dilate(img_temp2, hori_kernel, iterations=4) 20 cv2.imwrite("image_6_OP1.jpg",horizontal_lines_img) find_vertical.py hosted with ❤ by GitHub view raw Next, we have to add image “vertical_lines_img” with “horizontal_lines_img” to get the final image. 1 #Join horizontal and vertical images 2 alpha = 0.5 3 beta = 1.0 - alpha 4 img_final_bin = cv2.addWeighted(verticle_lines_img, alpha, horizontal_lines_img, beta, 0 5 img_final_bin = cv2.erode(~img_final_bin, kernel, iterations=2) 6 (thresh, img_final_bin) = cv2.threshold(img_final_bin, 0,255, cv2.THRESH_BINARY_INV+cv2. add_image.py hosted with ❤ by GitHub https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd view raw 5/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Adding vertical and horizontal line Contours are defined as the line joining all the points along the boundary of an image that are having the same intensity. OpenCV has findContour() function that helps in extracting the contours from the image. Each individual contour is a Numpy array of (x,y) coordinates of boundary points of the object. We can use that to find all the objects in the final image (Only objects in the final image are the rectangles). Since final image is just the binary version of the original image coordinates of the rectangles in the final image is equal to the coordinates of the rectangles in the original image. Now we know the coordinates lets draw them on the original image using openCV’s drawContours() function. 1 contours, hierarchy = cv2.findContours(img_final_bin, cv2.RETR_TREE, cv2.CHAIN_APPROX_SI 2 cv2.drawContours(img, contours, -1, (0, 255, 0), 3) 3 plt.imshow(img) extract_box.py hosted with ❤ by GitHub view raw Code to find and draw the contours https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 6/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Step 2: Sorting the contours https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 7/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Now we have found all the rectangles, its time to sort them top-to-bottom based on their coordinates. This code below will do that for us. 1 def sort_contours(cnts, method="left-to-right"): 2 ''' 3 sort_contours : Function to sort contours 4 argument: 5 cnts (array): image contours 6 method(string) : sorting direction 7 output: 8 cnts(list): sorted contours 9 boundingBoxes(list): bounding boxes 10 ''' 11 # initialize the reverse flag and sort index 12 reverse = False 13 i = 0 14 15 # handle if we need to sort in reverse 16 if method == "right-to-left" or method == "bottom-to-top": 17 reverse = True 18 19 # handle if we are sorting against the y-coordinate rather than 20 # the x-coordinate of the bounding box 21 if method == "top-to-bottom" or method == "bottom-to-top": 22 i = 1 23 24 # construct the list of bounding boxes and sort them from top to 25 # bottom 26 boundingBoxes = [cv2.boundingRect(c) for c in cnts] 27 (cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes), 28 key=lambda b:b[1][i], reverse=reverse)) 29 30 # return the list of sorted contours and bounding boxes 31 return (cnts, boundingBoxes) view raw sort_contours.py hosted with ❤ by GitHub Function to sort contours Code Reference 1 (contours, boundingBoxes) = sort_contours(contours, method="top-to-bottom") view raw sorted_contours.py hosted with ❤ by GitHub Sorted contours sort_contours function will return contours and bounding boxes(top-left and bottomright coordinates) sorted in the method we have given. In this case method is top to https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 8/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science bottom. Step 3: Selection based on area There are many rectangles, but we only need the three largest ones. How can we select the three largest rectangles?…. One answer is to find the area of the rectangles, then choose the top 3 rectangles which have the maximum area. Overall solution These selected rectangles are the work-spaces that are then extracted from the worksheet and sent to the Analysis Module. Analysis Module Analysis module as explained above will first detect the lines, predict the characters in each line and finally forms an equation with the predicted characters then evaluate them by marking boxes. Line Detection Detecting the lines is the tricky part, everyone has their way of solving equations some solve step by step, some can solve in just one line, some might write steps for pages and some writes exponents way away from the equation confusing the module to treat those exponents as a separate line. Our line detection module assumes that there is a sufficient gap between lines and there is some intersection between exponential characters and line. First, the detected work-spaces are converted to binary images then compressed in a single array to take the forward derivative. Wherever there is a line there will be a change in the derivative. https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 9/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Change in derivatives of a binary image 1 hist = cv2.reduce(img_final_bin,1, cv2.REDUCE_AVG).reshape(-1) 2 th = 1 3 H,W = binary_img.shape[:2] 4 uppers = np.array([y for y in range(H-1) if hist[y]<=th and hist[y+1]>th]) 5 lowers = np.array([y for y in range(H-1) if hist[y]>th and hist[y+1]<=th]) extract_line.py hosted with ❤ by GitHub view raw The above code is just a glimpse of how line extraction works. To see complete code click extract_line. Character Segmentation and Exponential detection After detecting all the lines we have to send the extracted line images to the text_segment function which will use openCV’s find contours to segment the https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 10/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science characters and sort them using the function sort_contours described above where method is now set to left-to-right. It’s easy for us to say whether the given number is an exponent or not but for the model it’s not that easy. Assuming that the exponents are at-least above half of the line, we can drew a baseline at the center of the image any character which is above the baseline is considered as an exponent. Exponential detection Optical Character Recognition We can use MNIST dataset for digits (28*28 pixels) and Kaggle’s Handwritten Mathematical symbols dataset for symbols(45*45 pixels) to train the model. MNIST IMAGES How MNIST is actually created? 1. Handwritten digits of 128 * 128 pixels collected from 500 different writers. 2. A Gaussian filter is applied to the image to soften the edges https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 11/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science 3. The digit is then placed and centered into a square image by preserving the aspect ratio. 4. The image is then down-sampled to 28 × 28 pixels using bi-cubic interpolation Images of symbols are preprocessed in the same way as MNIST digits before training. The reason for preprocessing is that the two data-sets we have chosen have different characteristics like dimensions, thickness and line width this makes hard for the deep learning model to find the patterns. Preprocessing helps us reduce the variations among digits and symbols. Almost 60,000 images of digits and preprocessed symbols were trained on Deep Columnar Convolutional Neural Network (DCCNN) a single deep and wide neural network architecture that offers near state-of-the-art performance like ensemble models on various image classification challenges, such as MNIST, CIFAR-10, and CIFAR-100 datasets. This model achieved atmost 96 % accuracy. Deep Columnar Convolutional Architecture (DCCNN) To see training code click DCCNN_training.ipynb https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 12/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Evaluation and drawing boxes Evaluation is the last and the most important part. To solve any equation we can use python’s eval method. The eval method parses the expression passed to it and runs python expression(code) within the program This is an example of how eval works >>Enter the function y(in terms of x): ‘x*(x+1)*(x+2)’ >>Enter the value of x: 3 >>print(y) 60 Steps involved in evaluation process are, 1. Solve the given math question and save the answer 2. Solve each handwritten lines and compare its derived value with the answer stored. 3. Draw a green bounding box if the line correct and red if the line is wrong. Let’s take an example, say the question is to solve the equation A*x² + B*y where, A=56, B=7, x=3 and y=13 and the answer to this equation is 595 (56*3² +7*13 = 595). https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 13/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Sample workspace A green box indicates the line is correct, whereas red indicates the line is wrong. The first and the last line is correct, on solving these lines we get 595 which matches with the actual answer. The second line (56*7 + 92) is wrong. 3² is 9 but it’s written as 7 and on solving we get 584 which is not equal to 595 The third line (595 + 92) is also wrong, on solving this line we get 684 which again is not equal to 595. Conclusion Let’s summarize, the scanned worksheet is sent to the work-spaces detection module it will return all the rectangular work-spaces in the given worksheet, then the detected work-spaces are passed to line extraction module to extract all the lines. The extracted lines are then sent to character segmentation module it will segment the character and the deep learning model DCCNN will predict the digit/symbol. Finally, evaluation module will assess the line and draw red/green bounding box. Automating grading process not only helps the teachers and also creates a comfortable learning experience for the students. This solution can be made even cooler by recognizing more complex mathematical equations like differential https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 14/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science integral equations, recognition of cursive handwriting where the character is not separated, detecting plagiarism and recognizing chemical equations. Thanks for reading till the end !!... If you have enjoyed this post let me know by clapping and I’d be very grateful if you’d help it spread by sharing with your friends :). ✌️ Machine Learning Computer Vision Deep Learning Image Processing Follow Written by Divyaprabha M 54 Followers · Writer for Towards Data Science Enthusiastic Data Science student More from Divyaprabha M and Towards Data Science https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 15/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Jacob Marks, Ph.D. in Towards Data Science How I Turned My Company’s Docs into a Searchable Database with OpenAI And how you can do the same with your docs 15 min read · Apr 25 2.8K 36 Leonie Monigatti in Towards Data Science https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 16/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Getting Started with LangChain: A Beginner’s Guide to Building LLMPowered Applications A LangChain tutorial to build anything with large language models in Python · 12 min read · Apr 25 1.8K 14 Mahmoud Harmouch in Towards Data Science Rust: The Next Big Thing in Data Science A Contextual Guide for Data Scientists and Analysts 25 min read · Apr 25 1.1K 10 https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 17/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Matt Chapman in Towards Data Science How I Stay Up to Date With the Latest AI Trends as a Full-Time Data Scientist No, I don’t just ask ChatGPT to tell me · 8 min read · May 2 1.2K 21 See all from Divyaprabha M See all from Towards Data Science Recommended from Medium https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 18/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Cameron R. Wolfe in Towards Data Science Using Transformers for Computer Vision Are Vision Transformers actually useful? · 13 min read · Oct 5, 2022 152 4 Matt Chapman in Towards Data Science The Portfolio that Got Me a Data Scientist Job https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 19/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Spoiler alert: It was surprisingly easy (and free) to make · 10 min read · Mar 25 2.9K 43 Lists What is ChatGPT? 9 stories · 46 saves Staff Picks 311 stories · 79 saves The PyCoach in Artificial Corner You’re Using ChatGPT Wrong! Here’s How to Be Ahead of 99% of ChatGPT Users Master ChatGPT by learning prompt engineering. · 7 min read · Mar 18 20K 345 https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 20/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Michał Oleszak in Towards Data Science Self-Supervised Learning in Computer Vision How to train models with only a few labeled examples · 18 min read · Jan 30 55 Bert Gollnick in MLearning.ai Create a Custom Object Detection Model with YOLOv7 https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 21/22 24/05/2023, 18:27 Computer Vision — Auto grading Handwritten Mathematical Answersheets | by Divyaprabha M | Towards Data Science Train a model to detect face masks in real-time with the most powerful real-time algorithm YOLOv7 · 5 min read · Dec 10, 2022 99 1 Rashida Nasrin Sucky in Towards Data Science Easy Method of Edge Detection in OpenCV Python Using Canny Edge Detection Efficiently · 4 min read · Jan 24 89 1 See more recommendations https://towardsdatascience.com/computer-vision-auto-grading-handwritten-mathematical-answersheets-8974744f72dd 22/22