Search Medium Tensorflow, MNIST and your own handwritten digits Ole Kröger · Follow 7 min read · Jan 28, 2016 807 13 There are a lot of articles about MNIST and how to learn handwritten digits. So this one will be just another one? Nope, I’ll use the newest available library Tensorflow by Google. But they have there own MNIST article inside there tutorial section here. Come on, so this one will be just another one? Really nope, I didn’t find any article which describes how to recognize the digits as we normally write them. All digits inside MNIST looks like this: but no one writes with a perfect white pen on a black background. You will use your camera and make a photo where the digit isn’t in the center. That will look something like this: If you use that as the input for your neural net, you will get some random results. The last part of every article about MNIST is about the accuracy which is something around >85% and you will get something like 10% (random). How to get this accuracy with your own handwritten digits? The MNIST dataset — a small overview The MNIST dataset is a dataset of handwritten digits which includes 60,000 examples for the training phase and 10,000 images of handwritten digits in the test set. All images are size normalized to fit in a 20x20 pixel box and there are centered in a 28x28 image using the center of mass. These are important information for our preprocessing. Tensorflow — Library for machine learning by Google Tensorflow is an open source software library for machine learning which provides a flexible architecture and can run on the GPU and CPU and on many different devices including mobile devices. It’s helpful to read the MNIST tutorial directly on their side here. Here is the code from the tutorial with some comments. """ import tensorflow and the input_data script """ import tensorflow as tf import input_data You can download the input_data class here. # create a MNIST_data folder with the MNIST dataset if necessary mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) """ a placeholder for our image data: None stands for an unspecified number of images 784 = 28*28 pixel """ x = tf.placeholder("float", [None, 784]) # W # b we need our weights for our neural net = tf.Variable(tf.zeros([784,10])) and the biases = tf.Variable(tf.zeros([10])) """ softmax provides a probability based output we need to multiply the image values x and the weights and add the biases (the normal procedure, explained in previous articles) """ y = tf.nn.softmax(tf.matmul(x,W) + b) """ y_ will be filled with the real values which we want to train (digits 0-9) for an undefined number of images """ y_ = tf.placeholder("float", [None,10]) """ we use the cross_entropy function which we want to minimize to improve our model """ cross_entropy = -tf.reduce_sum(y_*tf.log(y)) """ use a learning rate of 0.01 to minimize the cross_entropy error """ train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) # initialize all variables init = tf.initialize_all_variables() # create a session sess = tf.Session() sess.run(init) # use 1000 batches with a size of 100 each to train our net for i in range(1000): batch_xs, batch_ys = mnist.train.next_batch(100) # run the train_step function with the given image values (x) and the real output (y_) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) """ Let's get the accuracy of our model: our model is correct if the index with the highest y value is the same as in the real digit vector The mean of the correct_prediction gives us the accuracy. We need to run the accuracy function with our test set (mnist.test) We use the keys "images" and "labels" for x and y_ """ correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}) This gives us an accuracy of about 91% which is pretty good for a small number of lines. But now we want to use it to play with our real data. Preprocessing The most basic approach is to scale all our handwritten digits to a 28x28 pixel image. # create an array where we can store our 4 pictures images = np.zeros((4,784)) # and the correct values correct_vals = np.zeros((4,10)) # we want to test our images which you saw at the top of this page i = 0 for no in [8,0,4,3]: # read the image gray = cv2.imread("img/blog/own_"+str(no)+".png", cv2.CV_LOAD_IMAGE_GRAYSCALE) # resize the images and invert it (black background) gray = cv2.resize(255-gray, (28, 28)) # save the processed images cv2.imwrite("pro-img/image_"+str(no)+".png", gray) """ all images in the training set have an range from 0-1 and not from 0-255 so we divide our flatten images (a one dimensional vector with our 784 pixels) to use the same 0-1 based range """ flatten = gray.flatten() / 255.0 """ we need to store the flatten image and generate the correct_vals array correct_val for the first digit (9) would be [0,0,0,0,0,0,0,0,0,1] """ images[i] = flatten correct_val = np.zeros((10)) correct_val[no] = 1 correct_vals[i] = correct_val i += 1 """ the prediction will be an array with four values, which show the predicted number """ prediction = tf.argmax(y,1) """ we want to run the prediction and the accuracy function using our generated arrays (images and correct_vals) """ print sess.run(prediction, feed_dict={x: images, y_: correct_vals}) print sess.run(accuracy, feed_dict={x: images, y_: correct_vals}) If you run this basic approach you will get an accuracy of 0.25 and a prediction of something like [3 5 2 3] The net will be slightly different anytime you run the script so it can be something different but yeah it’s wrong most of the time. Okay it’s quite obvious that the images doesn’t look like the trained ones. These are white digits on a gray background and not on a black one. Therefore we need to add the following line: (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU) directly after we resize the image. gray = cv2.resize(255-gray, (28, 28)) This gives us an accuracy of 50%, because the 0 is pretty good centered. Now let’s go back to the first sentences of the “MNIST” section in this entry. All images are size normalized to fit in a 20x20 pixel box and there are centered in a 28x28 image using the center of mass. These are important information for our preprocessing. First we want to fit the images into this 20x20 pixel box. Therefore we need to remove every row and column at the sides of the image which are completely black. while np.sum(gray[0]) == 0: gray = gray[1:] while np.sum(gray[:,0]) == 0: gray = np.delete(gray,0,1) while np.sum(gray[-1]) == 0: gray = gray[:-1] while np.sum(gray[:,-1]) == 0: gray = np.delete(gray,-1,1) rows,cols = gray.shape Now we want to resize our outer box to fit it into a 20x20 box. We need a resize factor for this. if rows > cols: factor = 20.0/rows rows = 20 cols = int(round(cols*factor)) gray = cv2.resize(gray, (cols,rows)) else: factor = 20.0/cols cols = 20 rows = int(round(rows*factor)) gray = cv2.resize(gray, (cols, rows)) But at the end we need a 28x28 pixel image so we add the missing black rows and columns using the np.lib.pad function which adds 0s to the sides. colsPadding = (int(math.ceil((28-cols)/2.0)),int(math.floor((28cols)/2.0))) rowsPadding = (int(math.ceil((28-rows)/2.0)),int(math.floor((28rows)/2.0))) gray = np.lib.pad(gray,(rowsPadding,colsPadding),'constant') In our really small test set of only four images we are getting an accuracy of 50%. The next step is to shift the inner box so that it is centered using the center of mass. We need two functions for this last step. The first one will get the center_of_mass mass which is a function in the library ndimage from scipy so we need to add from scipy import ndimage at the beginning of our code. def getBestShift(img): cy,cx = ndimage.measurements.center_of_mass(img) rows,cols = img.shape shiftx = np.round(cols/2.0-cx).astype(int) shifty = np.round(rows/2.0-cy).astype(int) return shiftx,shifty The second functions shifts the image in the given directions. The warpAffine function is explained here. This is our transformation matrix to shift the image. def shift(img,sx,sy): rows,cols = img.shape M = np.float32([[1,0,sx],[0,1,sy]]) shifted = cv2.warpAffine(img,M,(cols,rows)) return shifted after our last line in the for loop: gray = np.lib.pad(gray,(rowsPadding,colsPadding),’constant’) we need to add two lines to use the defined functions: shiftx,shifty = getBestShift(gray) shifted = shift(gray,shiftx,shifty) gray = shifted And finally we have a accuracy of 100%. Of course we used a very small test set, but you can see the changes after the preprocessing steps. The neural net wasn’t trained to recognize every handwritten digit without those preprocessing steps. You need far more images if you don’t want to preprocess them which would take much more time. You can download all the code on my OpenSourcES GitHub repo. You might be interested in another article on this topic about whole math formulas: https://medium.com/@o.kroeger/handwritten-equations-to-latexab0ed30036d3 Originally published at opensourc.es. Machine Learning 807 Data Science Python 13 Written by Ole Kröger Follow 65 Followers More from Ole Kröger Ole Kröger Ole Kröger Kaggle: Prime Travelling Santa 2018 — MIP Recognize your handwritten numbers First of all I want to say sorry for not writing in the last couple of months. University is… In the last blog entry you learned how to digitize a single handwritten digit. The… 9 min read · Jan 18, 2019 7 min read · Mar 2, 2016 4 46 Ole Kröger Ole Kröger Improve MNIST using your own handwritten digits This is the third blog entry about MNIST (the handwritten digit database). In the first one I… 5 min read · Feb 7, 2018 Handwritten equations to LaTeX During my class in Object Recognition and Image Understanding I had to choose my ow… 6 min read · Aug 19, 2018 1 See all from Ole Kröger Recommended from Medium Rokas Liuberskis in Towards AI Angel Gaspar TensorFlow OCR Model for Reading Captchas How to install TensorFlow on a M1/M2 MacBook with GPU-… Training a Custom OCR for Captcha Image Text Extraction with TensorFlow and CTC Lo… GPU acceleration is important because the processing of the ML algorithms will be don… · 9 min read · Dec 23, 2022 141 1 · 3 min read · Jan 2 18 2 Lists What is ChatGPT? Stories to Help You Level-Up at Work 9 stories · 38 saves 19 stories · 31 saves Staff Picks 307 stories · 72 saves Kenneth Leung in Towards Data Science Rokas Liuberskis in Towards AI Practical Guide to Transfer Learning in TensorFlow for… Handwriting words recognition with TensorFlow Clearly-explained step-by-step tutorial for implementing transfer learning in image… Construct an accurate handwriting recognition model with TensorFlow!… · 14 min read · Dec 27, 2022 206 1 · 9 min read · Jan 23 39 1 Victor Murcia Andrea D'Agostino in Towards Data Science Real-Time Facial Recognition with Python Get started with TensorFlow 2.0 — Introduction to deep learning One of the first computer vision projects I remember being really stoked about was… Kickstart your understanding of one of TensorFlow’s most powerful set of tools for… · 4 min read · Jan 9 · 11 min read · Nov 22, 2022 175 -- See more recommendations Help Status Writers Blog Careers Privacy Terms About Text to speech